Open jamesls opened 9 years ago
Thanks for logging this! Currently I'm only calling list_distributions(..)
and create_distribution(..)
on a boto3.client('cloudfront')
instance, which I know is a small fraction of the boto3 API, but it'd be nice if those calls cooperatively yielded in between making the request and receiving the response so that other coroutines in my app could run in the meantime. (I could always just roll my own client using aiohttp too since I'm using so little of the API, but figured it was worth asking about anyway.)
In case it helps, aiodns provides an example of supporting asyncio on Python 2.6+ (see https://github.com/saghul/aiodns#python-versions).
If boto3 was already written with the assumption of blocking rather than asynchronous I/O throughout, I'm not sure how disruptive a change this would be.
I'm not that familiar with the internals of botocore, but I think the right approach is to create a custom version of endpoint.py and client.py that used aiohttp under the hood and yielded as you want.
I would probably start by subclassing Endpoint and making an aiohttp version of _send_request and _get_request. I'd also subclass ClientCreator and override _create_api_method (that is where the calls to the endpoint come from). I haven't yet figured out how i'd get from a service object to a client created with my ClientCreator subclass.
Does that sound right, @jamesls? I might have a go...
I've started a port of botocore to asyncio, at:
https://github.com/rdbhost/botocore
So far, the S3 integration tests pass, as do about 2/3 of the unit tests. I hope to have all tests converted and passing by a week from now.
The port depends on yieldfrom.requests and yieldfrom.urllib3 , asyncio ports of the requests and urllib3 libraries.
@rdbhost Awesome! Thanks for working on this. Definitely interested to follow your progress. @jamesls / botocore maintainers, had a chance to check this out / evaluate for merge potential?
+1
+1
This issue has gotten a couple of +1s in the last few days, so I thought I would point out that the work is done, at:
https://github.com/rdbhost/yieldfromBotocore
It implements an asyncio version of botocore, not boto3. I may eventually convert boto3, but my own needs have been satisfied by botocore, so other things now have priority.
David
On Thu, Jun 18, 2015 at 6:59 AM, AlexNigl notifications@github.com wrote:
+1
— Reply to this email directly or view it on GitHub https://github.com/boto/botocore/issues/458#issuecomment-113148024.
+1
@rdbhost, thanks for tackling that! It seems, however, that https://github.com/rdbhost/yieldfromBotocore is 271 commits behind upstream. Are you actively maintaining it?
My intention has been to do a merge when upstream reached v1.0; that seems to have happened without my noticing.
I will be merging commits from upstream, this weekend.
+1 @jamesls are you planning on pulling rdbhost's changes into botocore itself? We are very interested in this feature.
It is not a very promising candidate for merging back into botocore.
The changes are numerous, and the changes needed to make botocore functional within asyncio make it non-functional outside asyncio. I expect it to be a seperate product indefinitely, with a parallel API, meaning an API as similar as possible within the asyncio constraints.
One thing that also complicates things is that botocore supports as far back as python 2.6.5, so we'll need to figure out how we can support asyncio and still maintain py2 support. I see that there are asyncio backports to python2, so perhaps something could be done there.
Trollius (asyncio
port to Python 2.x) is complete and stable.
+1
I have working port of botocore for asyncio: https://github.com/jettify/aiobotocore using aiohttp for async http requests. I am trying to reuse as much botocore code as possible, so I patched only several classes and just import rest of the code as result library has few hundreds lines of code. And this approach helps to keep up with upstream, but obvious downside, I rely on internal interfaces which is subject of change for new libs. API almost the same as botocore just yield from
or awaite
(python 3.5) should be added before calls.
For now I am using aiobotocore with s3 and ported almost all s3 test, except I need to work more on pagination since it is not easy to implement iterator protocol with yield from
.
+1 integrating aiobotocore might be doable
+1
+1
+1
+1
OK guys, I've written a production ready, and hopefully useful to the community, asyncio extension to botocore. The library is currently included in https://github.com/quantmind/pulsar-cloud in the asyncbotocore module but it is self-contained and could be striped out if needed.
The API is the same as botocore but with the addition of http_session
keyword which is an optional asyncio
compatible HTTP client (it must expose the _loop
attribute) and API similar to requests.
As far as I know there are two of such clients:
_loop
attribute?)If used with pulsar, the library can also use greenlet in an implicit asynchronous fashion. Check https://github.com/quantmind/pulsar-cloud for more info
By the way, thanks to @jettify for the initial effort from which I leveraged from
Feedbacks welcome!
@lsbardel yes, aiohttp.ClientSession
has _loop
but I don't encourage using private attributes.
BTW we use https://github.com/jettify/aiobotocore as aiohttp-based library.
@asvetlov cool, I understand the private attribute thing, but the _loop
attribute should be (almost) a standard for asyncio objects ;-) maybe we should ask guido.
I would be happy to use https://github.com/jettify/aiobotocore has low level async botocore library but currently it does not work with pulsar http client. At the moment it clearly requires aiohttp
while cloud.asyncbototore does not require an asyncio library as such.
So maybe we should converge to a library that allows for different http
clients?
By the way all that is python 3.4 or above
@lsbardel At asyncio active development times Guido was against public loop attribute, he motivated it as "user always may pass explicit loop if needed". I don't think the decision has changed now.
+1
+1
+1
+1
+1
+1
How much work do you think it would take to add this functionality to boto3?
Boto3's interesting because of the resources layer which can make multiple API calls, e.g:
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
for key in bucket.objects.all():
print(key.key)
What would the ideal async API look like for resources?
Ideally the implementation would allow for callbacks/extension points for people to integrate the non-blocking solution of their choice whether, gevent or asyncio etc.
It can be tricky to do expose that, is boto3 using requests or something like that under the hood?
For example, prompt_toolkit didn't originally have async support but added the ability to be used in a event loop.
Also here is an example:
Here is another example discussing: Async code design
Steve Morin:
If what you are wanting is a way to make requests to AWS from an asyncio app, using an API very similar to botocore, look at:
https://github.com/rdbhost/yieldfromBotocore
It is based on an asyncio port of requests:
https://github.com/rdbhost/yieldfromRequests
The yieldfromBotocore library is not maintained anymore, as the boto3 API library was evolving too fast to keep up with, and I have other priorities. It does work on my production site, though, and possibly others.
Simply adding asyncio support to botocore itself is probably not viable, as the architectural differences between the asyncio and traditional blocking approaches is too profound to address both in one code base.
On Mon, May 23, 2016 at 8:39 PM, Steve Morin notifications@github.com wrote:
Ideally the implementation would allow for callbacks/extension points for people to integrate the non-blocking solution of their choice whether, gevent or asyncio etc.
It can be tricky to do expose that, is boto3 using requests or something like that under the hood?
For example, prompt_toolkit didn't originally have async support but added the ability to be used in a event loop.
Also here is an example:
Here is another example discussing: Async code design
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/boto/botocore/issues/458#issuecomment-221151568
@jamesls using python3.5 features api is pretty straitforward:
s3 = boto3.resource('s3')
async for bucket in s3.buckets.async_all():
async for key in bucket.objects.async_all():
print(key.key)
@smorin prompt toolkit is using threads for some blocking calls, in same fashion you can do with boto, just create big enough ThreadPoolExecutor and use it for each boto call that performs IO. This solution will work absolutely fine (with some exceptions of course).
I am looking to run this on AWS Lambda so trying to avoid threads for that reason and the performance penalty.
On Tue, May 24, 2016 at 2:03 PM, Nikolay Novik notifications@github.com wrote:
@smorin https://github.com/smorin prompt toolkit is using threads for some blocking calls, in same fashion you can do with boto, just create big enough ThreadPoolExecutor and use it for each boto call that performs IO. This solution will work absolutely fine (with some exceptions of course).
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/boto/botocore/issues/458#issuecomment-221391484
Steve Morin | Hacker, Entrepreneur, Startup Advisor twitter.com/SteveMorin | stevemorin.com Live the dream start a startup. Make the world ... a better place.
+1
I'm downloading files from S3 in my lambda function and my lambda cost is extremely high because boto3 file download operation is blocking. I am really looking forward to implement this without asyncio that works in Python 2.7
+1
+1
twisted now supports asyncio. Twisted supports both Python 2 and Python 3.
If you ported botocore to twisted you could expose a blocking wrapper in boto3, and users could use botocore asynchronously via asyncio and twisted in Python 3 and just twisted in Python 2.
@graingert I do not think it is best approach.
Better approach is to move request preparation and response parsing logic out of IO library, as result anyone can add support for new framework regardless async or sync (https://sans-io.readthedocs.io/). botocore
is doing good job there, I managed to hack asyncio support in aiobotocore
very quickly and with little amount of code.
@jettify yeah that's an even better solution.
Updated pulsar-cloud to be compatible with botocore 1.4.61
Feedback welcome
+1
There is open a new thread at aiobotocore
side [1] with a new proposal to align botocore
and aiobotocore
, we will like to get more opinions and especially the botocore
maintainers such as @jamesls
+1
This is a tracking issue for the feature request of supporting asyncio in botocore, originally asked about here: https://github.com/boto/botocore/issues/452
There's no definitive timeline on this feature, but feel free to +1 (thumbs up 👍) this issue if this is something you'd like to see. Also, if you have any additional information about asyncio you'd like to share (even just about your specific use case) feel free to chime in.