Azure / blobxfer

Azure Storage transfer tool and data movement library
MIT License
151 stars 38 forks source link

Retry policy did not allow for a retry #131

Closed ErikvdVen closed 2 years ago

ErikvdVen commented 2 years ago

Problem Description

I'm trying to copy all files from one container (or blob storage) to another. While it took a while to come to the following setup, it seems to work, but one issue bothers me. After running the script, the terminal gets flooded with the following messages:

2022-01-25 08:20:46.251962 azure.storage.common.storageclient ERROR %s Retry policy did not allow for a retry: %s, HTTP status code=%s, Exception=%s.
2022-01-25 08:20:46.264437 azure.storage.common.storageclient ERROR %s Retry policy did not allow for a retry: %s, HTTP status code=%s, Exception=%s.
2022-01-25 08:20:46.275297 azure.storage.common.storageclient ERROR %s Retry policy did not allow for a retry: %s, HTTP status code=%s, Exception=%s.
2022-01-25 08:20:46.284932 azure.storage.common.storageclient ERROR %s Retry policy did not allow for a retry: %s, HTTP status code=%s, Exception=%s.
2022-01-25 08:20:46.297475 azure.storage.common.storageclient ERROR %s Retry policy did not allow for a retry: %s, HTTP status code=%s, Exception=%s.
2022-01-25 08:20:46.312018 azure.storage.common.storageclient ERROR %s Retry policy did not allow for a retry: %s, HTTP status code=%s, Exception=%s.
2022-01-25 08:20:46.329865 azure.storage.common.storageclient ERROR %s Retry policy did not allow for a retry: %s, HTTP status code=%s, Exception=%s.

The errors are from the azure-storage-python package, which is used by blobxfer. It seems to complain there isn't a real "retry" instance, set. But I'm not sure if that is actually the case or how to prevent those errors from showing up.

The code I got so far:

general_options = blobxfer.api.GeneralOptions(
    concurrency=blobxfer.api.ConcurrencyOptions(
        transfer_threads=1,
        disk_threads=1,
        crypto_processes=1,
        md5_processes=1,
        action=3 # 3 = SyncCopy
    ),
    timeout=blobxfer.api.TimeoutOptions(
        connect=100,
        read=100,
        max_retries=3
    ),
    progress_bar=True
)

# construct specification
specification = blobxfer.api.SynccopySpecification(
    blobxfer.api.SyncCopyOptions(
        access_tier="Cool",
        delete_extraneous_destination=False,
        delete_only=False,
        dest_mode=StorageModes.Auto,
        mode=StorageModes.Auto,
        overwrite=True,
        recursive=True,
        rename=False,
        server_side_copy=False,
        strip_components=0,
    ),
    blobxfer.api.SkipOnOptions(
        filesize_match=False,
        lmt_ge=True,
        md5_match=False,
    ),
)

source_path = blobxfer.api.AzureSourcePath()

source_path.add_path_with_storage_account(
    remote_path=directorymapping.source,
    storage_account='source-sa',
)
specification.add_azure_source_path(source_path)

destination_path = blobxfer.api.AzureDestinationPath()
destination_path.add_path_with_storage_account(
    remote_path=directorymapping.destination,
    storage_account='destination-sa'
)
specification.add_azure_destination_path(destination_path)

credentials = blobxfer.api.AzureStorageCredentials(
    general_options=general_options
)
credentials.add_storage_account('source-sa', 'source-secret-key', 'core.windows.net')
credentials.add_storage_account('destination-sa', 'destination-secret-key', 'core.windows.net')

synccopy = blobxfer.api.SyncCopy(
    general_options,
    credentials,
    specification
)

synccopy.start()

Although all those error messages pop up, the blob files do seem to get transferred successfully!! That's why I'm not sure whether blobxfer actually retries sending the files in the background or just uses this method to check whether the file already exists and creates a new file afterwards... Either way, if the error can be ignored it would be better to just prevent it from showing up.

ErikvdVen commented 2 years ago

I looked closer into the code and it seems it first tries to check whether a file already exists at the destination, and therefore it uses the azure storage module, which logs an error when it cannot find the file.

Blobxfer creates the file afterwards, so after running the script for a second time the errors are gone. I still think the error is ugly and makes us feel something went terribly wrong, while everything seems to work perfectly fine. I would love to see a solution for this.

alfpark commented 2 years ago

Unfortunately this is behavior, as you correctly pointed out, from the underlying dependency. There's not much that can be done on the blobxfer side unless you inject custom log filtering in your program.