Azure / azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
MIT License
606 stars 218 forks source link

Azcopy command taking lot of time while copying large number of files having small size. #2739

Open divyn10 opened 3 months ago

divyn10 commented 3 months ago

Which version of the AzCopy was used?

10.24.0

Which platform are you using? (ex: Windows, Mac, Linux)

Linux (running on a k8s pod having image peterdavehello/azcopy)

What command did you run?

az copy [src blob with sas] [destination blob with sas] --overwrite=prompt --from-to=BlobBlob --s2s-preserve-access-tier=false --check-length=false --include-directory-stub=false --s2s-preserve-blob-tags=true --recursive=true --log-level=ERROR

What problem was encountered?

I am trying to copy across region and subscription, there is one container having lot of folder each containing a single file. The size of file is small (around 2-3 Kbs) but the number of such folders are huge (more than 4-5M)

also using export AZCOPY_CONCURRENCY_VALUE=2000 as suggested here

It is taking lot of time, any solution to speed up this?

How can we reproduce the problem in the simplest way?

Have you found a mitigation/solution?

ashruti-msft commented 3 months ago

Speeding up AzCopy, especially when dealing with a large number of small files across regions and subscriptions, can be challenging due to the nature of the operation and the limitations of network latency and bandwidth. Can you check if azcopy is able to effectively utilize available bandwidth? Also, try upgrading to the latest version and see if it improves performance.

divyn10 commented 2 months ago

@ashruti-msft Tried with the upgraded version as well. Still taking lot of time.

Below is the data on the blob, which I am trying to migrate.

Active blobs: 1,35,62,799 blobs, 116.62 GiB (1,25,22,15,68,871 bytes). 
Snapshots: 0 blobs, 0 B (0 bytes). Versions: 3,04,91,482 blobs, 351.84 GiB (3,77,78,39,94,902 bytes). 
Deleted blobs: 42,61,418 blobs, 82.43 GiB (88,50,99,41,622 bytes). 
Total: 4,83,15,699 items, 550.89 GiB (5,91,51,55,05,395 bytes).
tanyasethi-msft commented 2 months ago

Thanks @Divyn10 for your response. You can take the following measures to optimize the performance -

  1. Ensure that each jobs transfers fewer than a million files - AzCopy job tracking mechanism incurs a significant amount of overhead.
  2. Consider setting the --log-level parameter of your copy, sync, or remove command to ERROR
  3. Setting AZCOPY_CONCURRENT_SCAN to a higher number (Linux only)