This PR adds support for parallel uploads and parallel downloads (when using ranged gets). The change is generally useful but is specifically targeted to support Rails ActiveStorage performance improvements so you can, for example, setup a Rails storage configuration like:
# config/storage.yml
local:
service: ParallelAzureStorage
storage_account_name: <%= ENV["AZURE_STORAGE_ACCOUNT"] %>
storage_access_key: <%= ENV["AZURE_STORAGE_ACCESS_KEY"] %>
storage_blob_host: <%= ENV["AZURE_STORAGE_BLOB_HOST"] %>
container: <%= ENV["AZURE_STORAGE_BLOB_CONTAINER"] %>
storage_blob_write_block_size: 10000000
storage_blob_parallel_threshold: 75000000
storage_blob_parallel_threads: 15
http_pool_size: 20
# lib/active_storage/service/parallel_azure_storage_service.rb
module ActiveStorage # Once Rails natively supports setting a "stream_chunk_size", we can remove this
class Service::ParallelAzureStorageService < Service::AzureStorageService
def stream(key)
blob = blob_for(key)
stream_chunk_size = 300.megabytes
offset = 0
raise ActiveStorage::FileNotFoundError unless blob.present?
while offset < blob.properties[:content_length]
_, chunk = client.get_blob(container, key, start_range: offset, end_range: offset + stream_chunk_size - 1)
yield chunk.force_encoding(Encoding::BINARY)
offset += stream_chunk_size
end
end
end
end
In that configuration, requests over 75MB would be fetched in parallel over 15 threads (each request getting 5MB). Larger streamed requests would be fetched 300mb (stream_chunk_size = 300.megabytes) at a time in 15 parallel threads (each request getting 20MB).
Uploading and downloading larger blobs in parallel gives a significant performance boost (2x or 3x):
I also updated DEFAULT_WRITE_BLOCK_SIZE_IN_BYTES to be 5MB to take advantage of high throughput block blobs:
If possible, use blob or block sizes greater than 4 MiB for standard storage accounts and greater than 256 KiB for premium storage accounts. Larger blob or block sizes automatically activate high-throughput block blobs. High-throughput block blobs provide high-performance ingest that is not affected by partition naming.
I don't think there's any real downside to defaulting to a value over 4MB, is there?
This PR adds support for parallel uploads and parallel downloads (when using ranged gets). The change is generally useful but is specifically targeted to support Rails ActiveStorage performance improvements so you can, for example, setup a Rails storage configuration like:
In that configuration, requests over 75MB would be fetched in parallel over 15 threads (each request getting 5MB). Larger streamed requests would be fetched 300mb (
stream_chunk_size = 300.megabytes
) at a time in 15 parallel threads (each request getting 20MB).Uploading and downloading larger blobs in parallel gives a significant performance boost (2x or 3x):
I also updated DEFAULT_WRITE_BLOCK_SIZE_IN_BYTES to be 5MB to take advantage of high throughput block blobs:
I don't think there's any real downside to defaulting to a value over 4MB, is there?