Azure / azure-storage-cpp

Microsoft Azure Storage Client Library for C++
http://azure.github.io/azure-storage-cpp
Apache License 2.0
131 stars 147 forks source link

Configurable parameters to improve performance while reading page blobs #378

Open majumd opened 3 years ago

majumd commented 3 years ago

Hi, I would like to know the way to improve performance while reading from a page blob. Are there any configurable parameters such as control the number of threads or buffer size which could be used to improve performance? An enhancement to have the performance factors configurable to tweak as per the environment would be helpful. Thanks Udayan

Jinming-Hu commented 3 years ago

Hi @majumd , every blob API accepts a blob_request_options as a parameter. blob_request_options has a member function set_parallelism_factor with which you can set the max number of threads performing the download operation.

Jinming-Hu commented 3 years ago

You also mentioned buffer size, actually there will be multiple data copy during the download process. For example, you download 100MB blob, the 100MB data will be copied 2 or 3 times (I cannot remember). Is this also something you want to optimize?

majumd commented 3 years ago

Hi @majumd , every blob API accepts a blob_request_options as a parameter. blob_request_options has a member function set_parallelism_factor with which you can set the max number of threads performing the download operation.

Thanks for the response. I could see that the default value of the member variable is m_parallelism_factor is 1. Could you please explain how this could be used to improve data read performance from Azure Cloud.

Suppose we would like to read 40MB of data, Could the value of the variable be set to 10 using function set_parallelism_factor ? Does it mean that now the read request of 40MB would ideally take the same time as the time taken for 4MB as 10 parallel requests would be made to Azure each request requesting for 4MB data as per m_stream_read_size?

Jinming-Hu commented 3 years ago

Suppose we would like to read 40MB of data, Could the value of the variable be set to 10 using function set_parallelism_factor ? Does it mean that now the read request of 40MB would ideally take the same time as the time taken for 4MB as 10 parallel requests would be made to Azure each request requesting for 4MB data as per m_stream_read_size?

Yes