Open kumaran-sowrirajan opened 3 months ago
@ibrahimrabab @ibrandes @seanmcc-msft
Thank you for your feedback. Tagging and routing to the team member best able to assist.
Thank you for filing this @kumaran-sowrirajan.
Could you share a reproduction of the code you're using to upload using getBlobOutputStream
so we can further investigate why the upload speed is slow. Additionally, you mentioned that write speed is slow when downloading and uploading are occurring at the same time, could you see/share performance information when only upload is happening, it could be possible that the network, or CPU, is at capacity resulting in slower upload performance.
Thank you for filing this @kumaran-sowrirajan.
Could you share a reproduction of the code you're using to upload using
getBlobOutputStream
so we can further investigate why the upload speed is slow. Additionally, you mentioned that write speed is slow when downloading and uploading are occurring at the same time, could you see/share performance information when only upload is happening, it could be possible that the network, or CPU, is at capacity resulting in slower upload performance.
Thank you for the response @alzimmermsft . Let me get the code to reproduce it on your end.
Hi @kumaran-sowrirajan. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
Hi @kumaran-sowrirajan, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
Query/Question
Our application use the following API getBlobOutputStream) to write the data into the Azure Datalake container.
When streaming the write using the above API, I see the slowness. It takes about 1+ minute to write 1 GB of data to the datalake container. We tested with avro files and those sizes varies. I hope the file types or its sizes would not cause these performance issues.
Out application is deployed in a virtual machine. The source file exists in the virtual machine's OS storage. Our API reads the file contents from the source and write them to the datalake via the stream API. If both read and write are happening in a virtual machine, the write speed is very slow. Both the storage account and the virtual machine are located in the same region.
Virtual machines:
Standard E4ds v4
Standard_D64ds_v4
Why is this not a Bug or a feature Request? What are all the best practices to follow to speed up the read as well as the write? Not sure what configuration parameter we are missing to make the both the read and write faster.
Setup (please complete the following information if applicable):