Azure / azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
MIT License
600 stars 215 forks source link

High number of calls to GetBlobProperties when running azcopy copy #2737

Open Pluggi opened 2 months ago

Pluggi commented 2 months ago

Which version of the AzCopy was used?

Note: The version is visible when running AzCopy without any argument
AzCopy 10.22.1

Which platform are you using? (ex: Windows, Mac, Linux)

Linux

What command did you run?

Note: Please remove the SAS to avoid exposing your credentials. If you cannot remember the exact command, please retrieve it from the beginning of the log file.
azcopy copy "https://${SRC}.blob.core.windows.net?${SAS_TOKEN}" "https://${DST}.blob.core.windows.net?${SAS_TOKEN}"

What problem was encountered?

We are seeing a lot of GetBlobProperties call on our destination Storage Account whenever we run the command.

2024-06-25T13-27-54

Two processes were started at midnight, with one finishing at 6:21AM and the other at 10:38AM. We would like to understand what are these calls used for and if they could be removed, as they incur high costs (we have 15 storage accounts with the same patterns, costing us 100$ per day).

Have you found a mitigation/solution?

I feel like the --s2s-preserve-properties could be the culprit. I am going to try disabling it and seeing what happens.

Pluggi commented 2 months ago

Setting --s2s-preserve-properties=false does not seem to change anything.

ashruti-msft commented 2 months ago

Hi this is a default behaviour of azcopy and there is no option to reduce the getBlobProperties calls.

By default AzCopy uses parallel hierarchical listing for the Blob endpoint in order to speed up the listing process.

To reduce the IOs/cost or optimize for a flat structure, you can choose to disable parallel hierarchical listing by setting the environment variable AZCOPY_DISABLE_HIERARCHICAL_SCAN to true. You can refer this for more information.

Please know that doing this would impact performance and if performance is one of your priorities, then this is NOT desirable but if you prioritize saving on costs then this can be an option.

Pluggi commented 2 months ago

Setting AZCOPY_DISABLE_HIERARCHICAL_SCAN does not seem to have made much of a difference unfortunately.

I have resorted to using azcopy sync for now, even though it uses a lot more memory.