Azure / azure-sdk-for-cpp

This repository is for active development of the Azure SDK for C++. For consumers of the SDK we recommend visiting our versioned developer docs at https://azure.github.io/azure-sdk-for-cpp.
MIT License
181 stars 126 forks source link

DownloadTo function very slow #6205

Open stasasekulic opened 3 days ago

stasasekulic commented 3 days ago

Hi!

I've integrated latest azure sdk for cpp into my application and I have noticed that DownloadTo function is working very slow. After some comparison with old SDK, I have noticed that latest SDK was using BOOST library and DownloadTo could run async. Also I have noticed that new SDK has Concurrency parameter in the transfer options, but changing it did affect speed at all.

Is there a way how this function could be speed up, its ~5x slower compared to the old SDK?

Update: When downloading whole blob at once it is fast, but when I has to be downloaded partially in ~10-100mb chunks, then it is way slower

Thanks in advance!

ahsonkhan commented 2 days ago

What is the versions of the old and new SDKs you are comparing against, where you noticed a performance difference? Are you installing azure-storage-blobs-cpp from vcpkg?

Could you share some more detail about what you are observing:

We don't depend on/use the boost library in our track 2 storage SDKs (the packages shipping out of this repo). Maybe you are referring to the older/track 1 SDKs based on cpprestsdk? https://github.com/Azure/azure-sdk-for-cpp/blob/b74d9c36be7f1e3b39de4767b2c26e06490a3d1c/sdk/storage/MigrationGuide.md#migration-benefits

https://learn.microsoft.com/en-us/azure/storage/blobs/quickstart-blobs-c-plus-plus?tabs=managed-identity%2Croles-azure-portal

stasasekulic commented 1 day ago
size_t BlobFile::Read(uint8_t* buf, size_t length) 
{       
   options.Range.Value().Length= length;
   options.Range.Value().Offset= current_position;

    auto downloadResponse= m_blob_client.DownloadTo(buf, length, options);

    auto read_bytes= downloadResponse.Value.ContentRange.Length.Value();

    if(read_bytes > 0) {
        current_position += read_bytes;
        if(current_position == blob_size) {
            is_EOF= true;
        }
    }
    else {
        is_EOF= true;
    }

    return read_bytes;
}

In the init I set Concurrency option to 80, get blob size,... Nothing special. For auth I'm using OAuth ClientSecretCredential which I set before the start and then afterwards I use it.

I'm aware that you are not using CPPREST or BOOST, I'm using later azure sdk for cpp. I build SDK it manually and then I use it.

After some investigation looks like "the problem" is that I am downloading blob piece by piece and pass it to another layer. I'm not downloading blob in one take, neither I download it to file. Also when I compared old sdk there was option OPEN_READ and blob could also be downloaded in pieces but it worked faster.