Azure / azure-sdk-for-cpp

This repository is for active development of the Azure SDK for C++. For consumers of the SDK we recommend visiting our versioned developer docs at https://azure.github.io/azure-sdk-for-cpp.
MIT License
177 stars 126 forks source link

DownloadTo function very slow #6205

Open stasasekulic opened 2 weeks ago

stasasekulic commented 2 weeks ago

Hi!

I've integrated latest azure sdk for cpp into my application and I have noticed that DownloadTo function is working very slow. After some comparison with old SDK, I have noticed that latest SDK was using BOOST library and DownloadTo could run async. Also I have noticed that new SDK has Concurrency parameter in the transfer options, but changing it did affect speed at all.

Is there a way how this function could be speed up, its ~5x slower compared to the old SDK?

Update: When downloading whole blob at once it is fast, but when I has to be downloaded partially in ~10-100mb chunks, then it is way slower

Thanks in advance!

ahsonkhan commented 2 weeks ago

What is the versions of the old and new SDKs you are comparing against, where you noticed a performance difference? Are you installing azure-storage-blobs-cpp from vcpkg?

Could you share some more detail about what you are observing:

We don't depend on/use the boost library in our track 2 storage SDKs (the packages shipping out of this repo). Maybe you are referring to the older/track 1 SDKs based on cpprestsdk? https://github.com/Azure/azure-sdk-for-cpp/blob/b74d9c36be7f1e3b39de4767b2c26e06490a3d1c/sdk/storage/MigrationGuide.md#migration-benefits

https://learn.microsoft.com/en-us/azure/storage/blobs/quickstart-blobs-c-plus-plus?tabs=managed-identity%2Croles-azure-portal

stasasekulic commented 2 weeks ago
size_t BlobFile::Read(uint8_t* buf, size_t length) 
{       
   options.Range.Value().Length= length;
   options.Range.Value().Offset= current_position;

    auto downloadResponse= m_blob_client.DownloadTo(buf, length, options);

    auto read_bytes= downloadResponse.Value.ContentRange.Length.Value();

    if(read_bytes > 0) {
        current_position += read_bytes;
        if(current_position == blob_size) {
            is_EOF= true;
        }
    }
    else {
        is_EOF= true;
    }

    return read_bytes;
}

In the init I set Concurrency option to 80, get blob size,... Nothing special. For auth I'm using OAuth ClientSecretCredential which I set before the start and then afterwards I use it.

I'm aware that you are not using CPPREST or BOOST, I'm using later azure sdk for cpp. I build SDK it manually and then I use it.

After some investigation looks like "the problem" is that I am downloading blob piece by piece and pass it to another layer. I'm not downloading blob in one take, neither I download it to file. Also when I compared old sdk there was option OPEN_READ and blob could also be downloaded in pieces but it worked faster.

stasasekulic commented 1 week ago

Solution that looks like it works for now was to download at least 4mb in internal buffer, then read from that buffer. After the buffer is empty, download new chunk. Splitting in too small chunks resulted with speed downgrade