Azure / azure-storage-net-data-movement

Azure Storage Data Movement Library for .Net
MIT License
275 stars 133 forks source link

Support Statement

If you are looking for support for any feature in our new storage service versions (e.g. Blob, File, DataLake) please look to our V12 releases. SDK Name Version Description NuGet/API Reference Links
Blob Storage SDK v12 for .NET v12.0.0 The next generation Blob Storage SDK. Supports sync and async IO. NuGet - Reference
File Storage SDK v12 for .NET 12.0.0-preview.5 The next generation File Storage SDK. Supports sync and async IO. NuGet - Reference
Data Lake Storage SDK v12 for .NET 12.0.0-preview.6 The next generation Data Lake Storage SDK. Supports sync and async IO. NuGet

Microsoft Azure Storage Data Movement Library (2.0.1)

The Microsoft Azure Storage Data Movement Library designed for high-performance uploading, downloading and copying Azure Storage Blob and File. This library is based on the core data movement framework that powers AzCopy.

For more information about the Azure Storage, please visit Microsoft Azure Storage Documentation.

Note: As of 0.11.0, the namespace has changed to Microsoft.Azure.Storage.DataMovement from Microsoft.WindowsAzure.Storage.DataMovement.

Features

Getting started

For the best development experience, we recommend that developers use the official Microsoft NuGet packages for libraries. NuGet packages are regularly updated with new functionality and hotfixes.

Target Frameworks

Requirements

To call Azure services, you must first have an Azure subscription. Sign up for a free trial or use your MSDN subscriber benefits.

Download & Install

Via Git

To get the source code of the SDK via git just type:

git clone https://github.com/Azure/azure-storage-net-data-movement.git
cd azure-storage-net-data-movement

Via NuGet

To get the binaries of this library as distributed by Microsoft, ready for use within your project you can also have them installed by the .NET package manager NuGet.

Install-Package Microsoft.Azure.Storage.DataMovement

Dependencies

Azure Storage Blob Client Library

This version depends on Azure Storage Blob Client Library

Azure Storage File Client Library

This version depends on Azure Storage File Client Library

Code Samples

Find more samples at the sample folder.

Upload a blob

First, include the classes you need, here we include Storage client library, the Storage data movement library and the .NET threading because data movement library provides Task Asynchronous interfaces to transfer storage objects:

using System;
using System.Threading;
using Microsoft.Azure.Storage;
using Microsoft.Azure.Storage.Blob;
using Microsoft.Azure.Storage.DataMovement;

Now use the interfaces provided by Storage client lib to setup the storage context (find more details at how to use Blob Storage from .NET):

string storageConnectionString = "myStorageConnectionString";
CloudStorageAccount account = CloudStorageAccount.Parse(storageConnectionString);
CloudBlobClient blobClient = account.CreateCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.GetContainerReference("mycontainer");
blobContainer.CreateIfNotExists();
string sourcePath = "path\\to\\test.txt";
CloudBlockBlob destBlob = blobContainer.GetBlockBlobReference("myblob");

Once you setup the storage blob context, you can start to use WindowsAzure.Storage.DataMovement.TransferManager to upload the blob and track the upload progress,

// Setup the number of the concurrent operations
TransferManager.Configurations.ParallelOperations = 64;
// Setup the transfer context and track the upload progress
SingleTransferContext context = new SingleTransferContext();
context.ProgressHandler = new Progress<TransferStatus>((progress) =>
{
    Console.WriteLine("Bytes uploaded: {0}", progress.BytesTransferred);
});
// Upload a local blob
var task = TransferManager.UploadAsync(
    sourcePath, destBlob, null, context, CancellationToken.None);
task.Wait();

Copy a blob

First, include the classes you need, which is the same as the sample to Upload a blob.

using System;
using System.Threading;
using Microsoft.Azure.Storage;
using Microsoft.Azure.Storage.Blob;
using Microsoft.Azure.Storage.DataMovement;

Now use the interfaces provided by Storage client lib to setup the storage contexts (find more details at how to use Blob Storage from .NET):

string sourceStorageConnectionString = "sourceStorageConnectionString";
CloudStorageAccount sourceAccount = CloudStorageAccount.Parse(sourceStorageConnectionString);
CloudBlobClient sourceBlobClient = sourceAccount.CreateCloudBlobClient();
CloudBlobContainer sourceBlobContainer = sourceBlobClient.GetContainerReference("sourcecontainer");
CloudBlockBlob sourceBlob = sourceBlobContainer.GetBlockBlobReference("sourceBlobName");

string destStorageConnectionString = "destinationStorageConnectionString";
CloudStorageAccount destAccount = CloudStorageAccount.Parse(destStorageConnectionString);
CloudBlobClient destBlobClient = destAccount.CreateCloudBlobClient();
CloudBlobContainer destBlobContainer = destBlobClient.GetContainerReference("destinationcontainer");
CloudBlockBlob destBlob = destBlobContainer.GetBlockBlobReference("destBlobName");

Once you setup the storage blob contexts, you can start to use WindowsAzure.Storage.DataMovement.TransferManager to copy the blob and track the copy progress:

// Setup the number of the concurrent operations
TransferManager.Configurations.ParallelOperations = 64;
// Setup the transfer context and track the copy progress
SingleTransferContext context = new SingleTransferContext();
context.ProgressHandler = new Progress<TransferStatus>((progress) =>
{
    Console.WriteLine("Bytes Copied: {0}", progress.BytesTransferred);
});

// Copy a blob
var task = TransferManager.CopyAsync(
    sourceBlob, destBlob, CopyMethod.ServiceSideSyncCopy, null, context, CancellationToken.None);
task.Wait();

DMLib supports three different copying methods: Synchronous Copy, Service Side Asynchronous Copy and Service Side Synchronous Copy. The above sample uses Service Side Synchronous Copy. See Choose Copy Method for details on how to choose the copy method.

Best Practice

Increase .NET HTTP connections limit

By default, the .Net HTTP connection limit is 2. This implies that only two concurrent connections can be maintained. It prevents more parallel connections accessing Azure blob storage from your application.

AzCopy will set ServicePointManager.DefaultConnectionLimit to the number of eight multiple the core number by default. To have a comparable performance when using Data Movement Library alone, we recommend you set this value as well.

ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8;

Turn off 100-continue

When the property "Expect100Continue" is set to true, client requests that use the PUT and POST methods will add an Expect: 100-continue header to the request and it will expect to receive a 100-Continue response from the server to indicate that the client should send the data to be posted. This mechanism allows clients to avoid sending large amounts of data over the network when the server, based on the request headers, intends to reject the request.

However, once the entire payload is received on the server end, other errors may still occur. And if Windows Azure clients have tested the client well enough to ensure that it is not sending any bad requests, clients could turn off 100-continue so that the entire request is sent in one roundtrip. This is especially true when clients send small size storage objects.

ServicePointManager.Expect100Continue = false;

Pattern/Recursive in DMLib

The following matrix explains how the DirectoryOptions.Recursive and DirectoryOptions.SearchPattern properties work in DMLib.

Source Search Pattern Recursive Search Pattern Example Comments
Local Wildcard Match TRUE "foo*.png" The search pattern is a standard wild card match that is applied to the current directory and all subdirectories.
Local Wildcard Match FALSE "foo*.png" The search pattern is a standard wild card match that is applied to the current directory only.
Azure Blob Prefix Match TRUE <domainname>/<container>/<virtualdirectory>/<blobprefix>

"blah.blob.core.windows.net/ipsum/lorem/foo*"
The search pattern is a prefix match.
Azure Blob Exact Match FALSE <domainname>/<container>/<virtualdirectory>/<fullblobname>

"blah.blob.core.windows.net/ipsum/lorem/foobar.png"
The search pattern is an exact match. If the search pattern is an empty string, no blobs will be matched.
Azure File N/A TRUE N/A Recursive search is not supported and will return an error.
Azure File Exact Match FALSE <domainname>/<share>/<directory>/<fullfilename>

"blah.files.core.windows.net/ipsum/lorem/foobar.png"
The search pattern is an exact match. If the search pattern is an empty string, no files will be matched.

Choose Copy Method

DMLib supports three copy methods:

Following is suggested copy method for different scenarios:

Following table shows supported directions with different copy method.

Append Blob Block Blob Page Blob Azure File
Append Blob Synchronous Copy
Service Side Asynchronous Copy
Service Side Synchronous Copy
N/A N/A Synchronous Copy
Service Side Asynchronous Copy
Block Blob N/A Synchronous Copy
Service Side Asynchronous Copy
Service Side Synchronous Copy
N/A Synchronous Copy
Service Side Asynchronous Copy
Page Blob N/A N/A Synchronous Copy
Service Side Asynchronous Copy
Service Side Synchronous Copy
Synchronous Copy
Service Side Asynchronous Copy
File Synchronous Copy
Service Side Asynchronous Copy
Synchronous Copy
Service Side Asynchronous Copy
Synchronous Copy
Service Side Asynchronous Copy
Synchronous Copy
Service Side Asynchronous Copy

Need Help?

Be sure to check out the Microsoft Azure Developer Forums on MSDN if you have trouble with the provided code or use StackOverflow.

Collaborate & Contribute

We gladly accept community contributions.

For general suggestions about Microsoft Azure please use our UserVoice forum.

Learn More