Azure / azure-storage-python

Microsoft Azure Storage Library for Python
https://azure-storage.readthedocs.io
MIT License
338 stars 240 forks source link

asyncio support? #534

Open kutoga opened 5 years ago

kutoga commented 5 years ago

Which version of the SDK was used? Please provide the output of pip freeze.

azure-storage-blob==1.1.0
azure-storage-common==1.1.0
azure-storage-file==1.1.0
azure-storage-nspkg==3.0.0
azure-storage-queue==1.1.0

What problem was encountered?

Will there be any support for asyncio? I think the library would perform much better if asynchronous operations would be supported.

Thank you very much

zezha-msft commented 5 years ago

Hi @kutoga, thanks for reaching out!

I have recorded your feedback and will discuss it with the Team. Thanks!

svencowart commented 5 years ago

Roberto Prevato came up with a simple solution for the problem but would need to be integrated into the library fully. https://robertoprevato.github.io/Upgrading-Azure-Storage-Python-SDK-to-support-asyncio/

Is this approach the direction your team wants to take? If so I would gladly help in making this change and submitting a PR once I am done. This SDK is almost useless in a production grade environment without async support.

lmazuel commented 5 years ago

@zezha-msft Please keep me and @johanste in the loop of these discussions ;). As you know that's a topic that I care a lot :D

zezha-msft commented 5 years ago

Hi @svencowart, thanks for reaching out! We really appreciate your feedback!

Could you please elaborate on your use case and why the lack of async support is hindering it? Both @seguler and I are interested in learning more about your scenario.

svencowart commented 5 years ago

Sure, I see it being a big problem in which you have some type of messaging service needing to process blobs locally. In my specific scenario, I have a websocket server that needs to download video files from blob storage and then process that video file locally. While the video is being downloaded, the server should be able to continue to respond to incoming events. Right now, performing a BlockBlobService.get_blob_to_path prevents that from happening.

kutoga commented 5 years ago

In my case, the software is connected to a message queue: For each message a (sometimes large) blob has to be downloaded and processed. While a blob is downloaded, another one could be processed. Unfortunately, without asyncio support it is not that easy to implement this.

RobertoPrevato commented 5 years ago

Hi @zezha-msft, I have a very good use case: "Could you please elaborate on your use case and why the lack of async support is hindering it?".

Consider ASP.NET Core web framework and how it performs better than older versions of ASP.NET. Internally it uses Kestrel server, which internally uses libuv networking library, so it is using an event loop (or more, together with multi-threading). I think most of .NET community embraced asynchronous code, and nobody who has the possibility to use ASP.NET Core will go back to the older web framework, when coding in C# or other .NET languages.

asyncio is the Python built-in framework to support event loops and non-blocking IO operations. It enables much greater concurrency than synchronous code. As a side note, it supports different implementations of event loops and it can be used with the same libuv used by Kestrel. The brilliant Yury Selivanov wrote a library, called uvloop, which is a Cython wrapper around libuv library (first article about uvloop).

In my case, if I want to create a Python web application that works fast, I will choose asyncio. I am even building my own web framework to benefit from the speed of static typing given by Cython, when parsing web requests and responses. Since I like Azure Storage very much, I also wrote (for private use) functions to upload files to Blob Storage with asyncio. In this context, I use the official SDK only to generate shared access signatures.

@kutoga - I wrote functions to upload files of any size to Azure Blob Service with asyncio. Currently I keep this code private in Azure DevOps, but I can share it in GitHub for you, if you wish. You could also use multi-threading with Python to do concurrent uploads, with the official storage SDK. GIL is not a problem in this scenario.

zezha-msft commented 5 years ago

Hi @kutoga @svencowart @RobertoPrevato, thank you so much for providing these insights! I'll discuss with the Team to see what is the timeline for this work.

@svencowart to answer your earlier question, if we were to officially support asyncio, we will most likely take advantage of the opportunity to do a complete re-write, in order to adopt the new layered architecture.

RobertoPrevato commented 5 years ago

Hi @kutoga, @svencowart I published my code to download and upload big files to blob service using asyncio and aiohttp, here: https://github.com/RobertoPrevato/AzureBlobAsyncUpload.

Please read the note I put in the README file: you might have a different scenario in mind (concurrent upload of chunks for every single file, instead of concurrent upload of different files). The code should be clear enough, but if anything I wrote is not clear please let me know. The part making uploads in chunks is here.

I also have code to read files from Blob Storage in chunks in asyncio-friendly way (using async for), but I didn't have time to clean it up for sharing it. I will do it when I get a free moment. Code to download with asyncio is shared there

Thanks @zezha-msft for your kind words, I hope I didn't sound "know-it-all" in my messages where I recommended asyncio; I am just fond of Python and Azure. :innocent: And thanks @lmazuel for being interested in this subject. PS. I found this article: Python at Microsoft: flying under the radar very interesting!

kutoga commented 5 years ago

@RobertoPrevato

Thank you for sharing the code:) I will play around with it.

agates commented 5 years ago

I would love async support for even basic API operations. I'm supporting subscriptions with thousands of Azure objects to work against and dealing with requests one by one is certainly limiting!

It would be useful for core libraries to specifically target python's async API without requiring an implementation-specific solution like asyncio. That way different async libraries could also be implemented, like trio or curio.

lmazuel commented 5 years ago

@agates trio should be definitely be part of the picture. And if we design it to be asyncio and trio ready, this means this is generic enough to support curio likely. Note that according to curio main maintainer directly (directly like in: direct live discussion at PyCon 2018 :)), people should stop curio and use trio instead.

bmc-msft commented 5 years ago

Azure Functions for python is built on top of asyncio. Supporting asyncio within this library would allow development of concurrent functions that interact with azure storage.

https://github.com/Azure/azure-functions-python-worker/wiki/Worker-Architecture

zezha-msft commented 5 years ago

@bmc-msft thanks for the feedback! We are actively working on providing async support. Please stay tuned!

zezha-msft commented 5 years ago

Hi all,

The async support was just shipped here: https://pypi.org/project/azure-storage-blob/12.0.0b2/

It is a rewrite of the SDK, and it has asyncio support. Version 12 will become GA by the end of year.

Please let me know if you have any question/concern.

bmc-msft commented 5 years ago

For those looking for the source for the rewrite: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage