Closed jkuhn-cuda closed 11 months ago
Hey @jkuhn-cuda thanks a lot for the PR, LGTM! Also, those performance improvements look great!
Thanks @samansmink ! I'm glad you are happy with it. Approximately how long should I expect it to take for the PR to be merged and to be available in an official released build? I need to use these new configuration values for a project I am working on and I need to determine if an official release build will be available in time. Otherwise, I will need to set up my project to use a custom build with these changes until an official build is available.
@jkuhn-cuda as soon as CI for this merge passes, the binary should be available in our nightly bucket which you can install from duckdb 0.9.2 using: force install azure from 'http://nightly-extensions.duckdb.org'
The read buffer size as well as the transfer chunk size and transfer concurrency options provided to the Azure Storage client have a large impact on both the duration of queries and number of transactions done against the Azure Storage Account. This PR makes these values configurable so that users can tune these settings to balance performance and transaction costs.
The following shows the impact on the duration of a query against a 1.9 GiB gzipped json lines blob:
azure_read_transfer_concurrency = 5 / azure_read_transfer_chunk_size = 1 MiB / azure_read_buffer_size = 1 MiB 69842.0027 ms
azure_read_transfer_concurrency = 1 / azure_read_transfer_chunk_size = 1 MiB / azure_read_buffer_size = 1 MiB 64520.8366 ms
azure_read_transfer_concurrency = 4 / azure_read_transfer_chunk_size = 32 MiB / azure_read_buffer_size = 128 MiB 46287.7139 ms
azure_read_transfer_concurrency = 16 / azure_read_transfer_chunk_size = 8 MiB / azure_read_buffer_size = 128 MiB 35221.4137 ms
azure_read_transfer_concurrency = 16 / azure_read_transfer_chunk_size = 16 MiB / azure_read_buffer_size = 256 MiB 29436.0231 ms
The number of transactions required to do the query will be approximately: BlobSize / azure_read_transfer_chunk_size
In this PR I chose defaults for these values as follows: