Azure / azure-storage-net-data-movement

Azure Storage Data Movement Library for .Net
MIT License
276 stars 132 forks source link

AzCopy - Sparse to Non-Sparsed copy/download #139

Open iyerusad opened 6 years ago

iyerusad commented 6 years ago

Goal: Get managed disk from Azure to local file share, as a dynamically expanding VHDx.

Scenario: I have a 127GB sparse VHD (managed disk) stored in Azure. The actual contents of VHD are significantly smaller (say ~20GB). I want to export that VHD. Every single method for exporting results in a 127GB file appears on file system, including download (supposedly sparse) or copy (also supposedly sparse enabled) using AZcopy.

Problems and Questions:

  1. Incorrect sized download: The machine downloading the VHD doesn't necessarily have 127GB of disk space. AZCopy will not start downloading unless it can allocate 127GB.

  2. (When copying to another storage account) Would it be possible to have the file go from sparse to fixed for easier local download? Downloading the actual size of VHD would be prohibitive will restrictive data caps.

  3. Downloading in Windows to mapped network share using latest AZcopy: Resulted in allocated 127GB (expected), but also downloading 127GB (not expected). Is this due to underlying filesystem not supporting sparse files or the similar?

  4. Fixed disk VHD as sparse file is still fixed disk. Is there a method to pipe the content of AZCopy download into something along the lines of Convert-VHD?

blueww commented 6 years ago

@iyerusad

When the Source is a VHD Page Blob, which actually content is much smaller than blob size, the actually transfer content size will be a little bigger than the actually content, but much smaller than the blob size.
AzCopy will split source page blob to 4MB ranges, and only download/Copy the ranges which has data. For the empty ranges, AzCopy will not download/copy it.

  1. For AzCopy, currently the source file and dest file must has same size. So when source blob is 127GB (even the content only ~20GB), the downloaded file still need 127GB. And to make sure the transfer can success, AzCopy will allocate 127GB when transfer start.

  2. I don't know a way to make the page blob go from sparse to fixed to reduce the blob size. But for copy, as my above comments, Azcopy will splite blob to 4MB ranges, and only copy the ranges which has data.

  3. Actually, if your source page blob only has 20GB content, the download size should be a little bigger than 20GB. How do you think AzCopy will download 127GB? The transfer speed and size show by AzCopy contains the empty range size , but the empty ranges will not really transfer. We show that since when use download 1TB blob, it will be weird if we show 200GB is download, and transfer finished successfully.

4.Currently, AzCopy don't support pipe line the downloaded content, But DMlib can download the content to a stream. You can use the API to download a blob to stream: https://azure.github.io/azure-storage-net-data-movement/html/b41dccd2-cc6a-5ce0-463b-31bcd5f150ce.htm

iyerusad commented 6 years ago

The transfer speed and size show by AzCopy contains the empty range size , but the empty ranges will not really transfer. We show that since when use download 1TB blob, it will be weird if we show 200GB is download, and transfer finished successfully.

Would it be possible to implement something similar to what rsync does for this scenario?

rsync (when used with --progress flag and incremental sync) will output something like the following, showing the total size and "speed up" achieved.

2018-07-18T19:23:54.7181422Z sent 37,875,923 bytes  received 73,537 bytes  5,248.52 bytes/sec
2018-07-18T19:23:54.7205034Z total size is 120,742,664  speedup is 3.18

This will be helpful for anyone using AzCopy to understand they are obtaining sparse data, in an efficient manner.

For the rest of the questions, for now I am implementing a staging storage account and then use Convert-VHD to convert the fixed disk to a dynamic disk VHDx, shrinking my 127GB to ~9GB. DMlib option might be more graceful.

blueww commented 6 years ago

@iyerusad Good to know that you are not blocked now.

Thanks for the suggestion! Per my understanding, there are 2 suggestions:

  1. Show the actual transfer size, and title size both in the output.
  2. Add option to DMlib that can do something like Convert-VHD.

For #2 , as DMlib is for data transfer, I am afraid we might won't do it recently. And since DMlib is SDK, customer can write their own code based on DMlib and other SDK to do this.

iyerusad commented 6 years ago

No worries. If # 1 can be done that would be a decent visual enhancement as well as nice little metric to showcase tools like AzCopy's efficiency.