Open ajs97 opened 3 years ago
import asyncio from azure.storage.blob import BlobServiceClient, BlobClient, BlobType
async def upload_blob(container_name, blob_name, data):
blob_service_client = BlobServiceClient.from_connection_string("
async def main():
container_name = "
tasks = []
for i in range(total_size // blob_size):
blob_name = f"blob_{i}"
data = b"Your 64MB data here" # Replace with your actual data
task = asyncio.ensure_future(upload_blob(container_name, blob_name, data))
tasks.append(task)
await asyncio.gather(*tasks)
if name == "main": loop = asyncio.get_event_loop() loop.run_until_complete(main())
Hi, I am facing some performance issues while uploading blobs to Azure blob storage using this particular SDK: I am uploading 64MB sized blobs, experimenting around with various values of
parallelism_factor
(4/8/16). When I upload around 1GB of data for parallelism = 8/16, I get around 110MBps, but when I increase the total to about 5GB, the total throughput drops to around 50MBps. I checked the intermediate throughput, and I see that for the initial few blobs I get 80-90MBps, but for the subsequent blobs the throughput drops to 40-50MBps, and it even drops down to 20MBps.Note that I am uploading these blobs sequentially.
Do you know what the possible reason could be for the difference in throughput for total size, and if there is some configuration which would lead to better throughput for large amount of data uploaded?
Note that for my use case, it is important to upload data in 64MB blobs, and the total amount of data uploaded will be in 10s of GBs, and would like to optimize for this use case. Thanks.