Azure / azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
MIT License
614 stars 222 forks source link

azcopy sync --delete-destination=true takes too much time on Deleting extra object #2844

Open ppolushkin opened 3 weeks ago

ppolushkin commented 3 weeks ago

Dear all,

I'm running azcopy sync command to backup data from azure data lake to azure storage account.

I'm using command like this: azcopy sync https://mydatalake.blob.core.windows.net/my-container/ https://mystorageaccount.blob.core.windows.net/my-container/ --recursive --log-level=NONE --delete-destination=true My azure data lake contains tens of millions small and medium files. Sync operation and copying of newly created files takes ~10 minutes, while deleting extra objects on destination takes many hours.

Moreover, even I specified --log-level=NONE I see messages like follow for each removed file: 6142703 Files Scanned at Source, 6844507 Files Scanned at Destination, 2-sec Throughput (Mb/s): 0 INFO: Deleting extra object: DELTA/path/to/my/file.parquet

Questions: 1) Is it possible to delete files on destination by batches? 2) How to turn off 'Deleting extra object' logging?

Details:

ubuntu:22.04

azcopy version 10.26.0

Environment variables:

AZCOPY_AUTO_LOGIN_TYPE: "SPN"
AZCOPY_TENANT_ID: "my-azure-tenant"
AZCOPY_SPA_APPLICATION_ID: "my-azure-client-id"
AZCOPY_SPA_CLIENT_SECRET: "secret"
AZCOPY_CONCURRENCY_VALUE: "3000"
AZCOPY_CONCURRENT_SCAN: "300"
AZCOPY_BUFFER_GB: "4"
AZCOPY_LOG_LOCATION: "/tmp"
AZCOPY_JOB_PLAN_LOCATION: "/tmp"

Kind regards,