Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.36k stars 2k forks source link

How to improve the write speed when using the stream API #41431

Open kumaran-sowrirajan opened 3 months ago

kumaran-sowrirajan commented 3 months ago

Query/Question

Our application use the following API getBlobOutputStream) to write the data into the Azure Datalake container.

When streaming the write using the above API, I see the slowness. It takes about 1+ minute to write 1 GB of data to the datalake container. We tested with avro files and those sizes varies. I hope the file types or its sizes would not cause these performance issues.

Out application is deployed in a virtual machine. The source file exists in the virtual machine's OS storage. Our API reads the file contents from the source and write them to the datalake via the stream API. If both read and write are happening in a virtual machine, the write speed is very slow. Both the storage account and the virtual machine are located in the same region.

Virtual machines:

Standard E4ds v4 Standard_D64ds_v4

Why is this not a Bug or a feature Request? What are all the best practices to follow to speed up the read as well as the write? Not sure what configuration parameter we are missing to make the both the read and write faster.

Setup (please complete the following information if applicable):

github-actions[bot] commented 3 months ago

@ibrahimrabab @ibrandes @seanmcc-msft

github-actions[bot] commented 3 months ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

alzimmermsft commented 3 months ago

Thank you for filing this @kumaran-sowrirajan.

Could you share a reproduction of the code you're using to upload using getBlobOutputStream so we can further investigate why the upload speed is slow. Additionally, you mentioned that write speed is slow when downloading and uploading are occurring at the same time, could you see/share performance information when only upload is happening, it could be possible that the network, or CPU, is at capacity resulting in slower upload performance.

kumaran-sowrirajan commented 3 months ago

Thank you for filing this @kumaran-sowrirajan.

Could you share a reproduction of the code you're using to upload using getBlobOutputStream so we can further investigate why the upload speed is slow. Additionally, you mentioned that write speed is slow when downloading and uploading are occurring at the same time, could you see/share performance information when only upload is happening, it could be possible that the network, or CPU, is at capacity resulting in slower upload performance.

Thank you for the response @alzimmermsft . Let me get the code to reproduce it on your end.

github-actions[bot] commented 1 week ago

Hi @kumaran-sowrirajan. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] commented 4 days ago

Hi @kumaran-sowrirajan, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!