parallel upload, ranged get

This PR adds support for parallel uploads and parallel downloads (when using ranged gets). The change is generally useful but is specifically targeted to support Rails ActiveStorage performance improvements so you can, for example, setup a Rails storage configuration like:

# config/storage.yml
local:
  service: ParallelAzureStorage
  storage_account_name: <%= ENV["AZURE_STORAGE_ACCOUNT"] %>
  storage_access_key: <%= ENV["AZURE_STORAGE_ACCESS_KEY"] %>
  storage_blob_host: <%= ENV["AZURE_STORAGE_BLOB_HOST"] %>
  container:  <%= ENV["AZURE_STORAGE_BLOB_CONTAINER"] %>
  storage_blob_write_block_size: 10000000
  storage_blob_parallel_threshold: 75000000
  storage_blob_parallel_threads: 15
  http_pool_size: 20

# lib/active_storage/service/parallel_azure_storage_service.rb
module ActiveStorage # Once Rails natively supports setting a "stream_chunk_size", we can remove this
  class Service::ParallelAzureStorageService < Service::AzureStorageService
      def stream(key)
        blob = blob_for(key)
        stream_chunk_size = 300.megabytes
        offset = 0
        raise ActiveStorage::FileNotFoundError unless blob.present?
        while offset < blob.properties[:content_length]
          _, chunk = client.get_blob(container, key, start_range: offset, end_range: offset + stream_chunk_size - 1)
          yield chunk.force_encoding(Encoding::BINARY)
          offset += stream_chunk_size
        end
      end
  end
end

In that configuration, requests over 75MB would be fetched in parallel over 15 threads (each request getting 5MB). Larger streamed requests would be fetched 300mb (stream_chunk_size = 300.megabytes) at a time in 15 parallel threads (each request getting 20MB).

Uploading and downloading larger blobs in parallel gives a significant performance boost (2x or 3x):

require "azure/storage/blob"
require "benchmark"

threehundredmeg = File.read("rand"); :DONE

parallel_client =  Azure::Storage::Blob::BlobService::create(
    {
      storage_account_name: "exampleaccount",
      storage_access_key: "AAexample_keyB==",
      storage_blob_parallel_threads: 5,
      storage_blob_parallel_threshold: 5 * 1024 * 1024
    })
super_parallel_client =  Azure::Storage::Blob::BlobService::create(
    {
      storage_account_name: "exampleaccount",
      storage_access_key: "AAexample_keyB==",
      storage_blob_parallel_threads: 15,
      storage_blob_parallel_threshold: 5 * 1024 * 1024,
      http_pool_size: 15
    })
single_client =  Azure::Storage::Blob::BlobService::create(
    {
      storage_account_name: "exampleaccount",
      storage_access_key: "AAexample_keyB==",
      storage_blob_parallel_threads: 1,
      storage_blob_parallel_threshold: 5 * 1024 * 1024
    })

Benchmark.measure { single_client.create_block_blob("data", "threehundred_single", threehundredmeg) }
=> #<Benchmark::Tms:0x00005624909c9630 @label="", @real=22.831228854000074, @cstime=0.0, @cutime=0.0, @stime=0.1568090000000001, @utime=3.0190269999999995, @total=3.1758359999999994>
Benchmark.measure { parllel_client.create_block_blob("data", "threehundred_parallel", threehundredmeg)
=> #<Benchmark::Tms:0x0000562490bc1a00 @label="", @real=9.799436117000027, @cstime=0.0, @cutime=0.0, @stime=0.27710099999999993, @utime=3.129546, @total=3.406647>

Benchmark.measure { single_client.get_blob("data", "threehundred_parallel", { start_range: 0, end_range: 300000000 }); :DONE }
=> #<Benchmark::Tms:0x0000562490c8bcd8 @label="", @real=13.33435237499998, @cstime=0.0, @cutime=0.0, @stime=0.4441289999999998, @utime=0.45342899999999986, @total=0.8975579999999996>
Benchmark.measure { parallel_client.get_blob("data", "threehundred_parallel", { start_range: 0, end_range: 300000000 }); :DONE }
=> #<Benchmark::Tms:0x0000562490c8bd00 @label="", @real=6.484543842999983, @cstime=0.0, @cutime=0.0, @stime=0.45743100000000014, @utime=0.4752680000000069, @total=0.932699000000007>
Benchmark.measure { super_parallel_client.get_blob("data", "threehundred_parallel", { start_range: 0, end_range: 300000000 }); :DONE }
=> #<Benchmark::Tms:0x00007fe3f83dbf60 @label="", @real=4.395406997000009, @cstime=0.0, @cutime=0.0, @stime=0.5267469999999999, @utime=0.5379800000000046, @total=1.0647270000000044>

I also updated DEFAULT_WRITE_BLOCK_SIZE_IN_BYTES to be 5MB to take advantage of high throughput block blobs:

If possible, use blob or block sizes greater than 4 MiB for standard storage accounts and greater than 256 KiB for premium storage accounts. Larger blob or block sizes automatically activate high-throughput block blobs. High-throughput block blobs provide high-performance ingest that is not affected by partition naming.

I don't think there's any real downside to defaulting to a value over 4MB, is there?

Azure / azure-storage-ruby

parallel upload, ranged get #215