Open kutoga opened 5 years ago
Hi @kutoga, thanks for reaching out!
I have recorded your feedback and will discuss it with the Team. Thanks!
Roberto Prevato came up with a simple solution for the problem but would need to be integrated into the library fully. https://robertoprevato.github.io/Upgrading-Azure-Storage-Python-SDK-to-support-asyncio/
Is this approach the direction your team wants to take? If so I would gladly help in making this change and submitting a PR once I am done. This SDK is almost useless in a production grade environment without async support.
@zezha-msft Please keep me and @johanste in the loop of these discussions ;). As you know that's a topic that I care a lot :D
Hi @svencowart, thanks for reaching out! We really appreciate your feedback!
Could you please elaborate on your use case and why the lack of async support is hindering it? Both @seguler and I are interested in learning more about your scenario.
Sure, I see it being a big problem in which you have some type of messaging service needing to process blobs locally. In my specific scenario, I have a websocket server that needs to download video files from blob storage and then process that video file locally. While the video is being downloaded, the server should be able to continue to respond to incoming events. Right now, performing a BlockBlobService.get_blob_to_path prevents that from happening.
In my case, the software is connected to a message queue: For each message a (sometimes large) blob has to be downloaded and processed. While a blob is downloaded, another one could be processed. Unfortunately, without asyncio support it is not that easy to implement this.
Hi @zezha-msft, I have a very good use case: "Could you please elaborate on your use case and why the lack of async support is hindering it?".
Consider ASP.NET Core web framework and how it performs better than older versions of ASP.NET. Internally it uses Kestrel server, which internally uses libuv networking library, so it is using an event loop (or more, together with multi-threading). I think most of .NET community embraced asynchronous code, and nobody who has the possibility to use ASP.NET Core will go back to the older web framework, when coding in C# or other .NET languages.
asyncio is the Python built-in framework to support event loops and non-blocking IO operations. It enables much greater concurrency than synchronous code. As a side note, it supports different implementations of event loops and it can be used with the same libuv used by Kestrel. The brilliant Yury Selivanov wrote a library, called uvloop, which is a Cython wrapper around libuv library (first article about uvloop).
In my case, if I want to create a Python web application that works fast, I will choose asyncio. I am even building my own web framework to benefit from the speed of static typing given by Cython, when parsing web requests and responses. Since I like Azure Storage very much, I also wrote (for private use) functions to upload files to Blob Storage with asyncio. In this context, I use the official SDK only to generate shared access signatures.
@kutoga - I wrote functions to upload files of any size to Azure Blob Service with asyncio. Currently I keep this code private in Azure DevOps, but I can share it in GitHub for you, if you wish. You could also use multi-threading with Python to do concurrent uploads, with the official storage SDK. GIL is not a problem in this scenario.
Hi @kutoga @svencowart @RobertoPrevato, thank you so much for providing these insights! I'll discuss with the Team to see what is the timeline for this work.
@svencowart to answer your earlier question, if we were to officially support asyncio, we will most likely take advantage of the opportunity to do a complete re-write, in order to adopt the new layered architecture.
Hi @kutoga, @svencowart I published my code to download and upload big files to blob service using asyncio and aiohttp, here: https://github.com/RobertoPrevato/AzureBlobAsyncUpload.
Please read the note I put in the README
file: you might have a different scenario in mind (concurrent upload of chunks for every single file, instead of concurrent upload of different files). The code should be clear enough, but if anything I wrote is not clear please let me know. The part making uploads in chunks is here.
I also have code to read files from Blob Storage in chunks in asyncio-friendly way (using async for), but I didn't have time to clean it up for sharing it. I will do it when I get a free moment.
Code to download with asyncio is shared there
Thanks @zezha-msft for your kind words, I hope I didn't sound "know-it-all" in my messages where I recommended asyncio; I am just fond of Python and Azure. :innocent: And thanks @lmazuel for being interested in this subject. PS. I found this article: Python at Microsoft: flying under the radar very interesting!
@RobertoPrevato
Thank you for sharing the code:) I will play around with it.
I would love async support for even basic API operations. I'm supporting subscriptions with thousands of Azure objects to work against and dealing with requests one by one is certainly limiting!
It would be useful for core libraries to specifically target python's async API without requiring an implementation-specific solution like asyncio. That way different async libraries could also be implemented, like trio or curio.
@agates trio should be definitely be part of the picture. And if we design it to be asyncio and trio ready, this means this is generic enough to support curio likely. Note that according to curio main maintainer directly (directly like in: direct live discussion at PyCon 2018 :)), people should stop curio and use trio instead.
Azure Functions for python is built on top of asyncio. Supporting asyncio within this library would allow development of concurrent functions that interact with azure storage.
https://github.com/Azure/azure-functions-python-worker/wiki/Worker-Architecture
@bmc-msft thanks for the feedback! We are actively working on providing async support. Please stay tuned!
Hi all,
The async support was just shipped here: https://pypi.org/project/azure-storage-blob/12.0.0b2/
It is a rewrite of the SDK, and it has asyncio support. Version 12 will become GA by the end of year.
Please let me know if you have any question/concern.
For those looking for the source for the rewrite: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage
Which version of the SDK was used? Please provide the output of
pip freeze
.What problem was encountered?
Will there be any support for
asyncio
? I think the library would perform much better if asynchronous operations would be supported.Thank you very much