boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.
Apache License 2.0
1.49k stars 1.09k forks source link

Support asyncio #458

Open jamesls opened 9 years ago

jamesls commented 9 years ago

This is a tracking issue for the feature request of supporting asyncio in botocore, originally asked about here: https://github.com/boto/botocore/issues/452

There's no definitive timeline on this feature, but feel free to +1 (thumbs up 👍) this issue if this is something you'd like to see. Also, if you have any additional information about asyncio you'd like to share (even just about your specific use case) feel free to chime in.

jab commented 9 years ago

Thanks for logging this! Currently I'm only calling list_distributions(..) and create_distribution(..) on a boto3.client('cloudfront') instance, which I know is a small fraction of the boto3 API, but it'd be nice if those calls cooperatively yielded in between making the request and receiving the response so that other coroutines in my app could run in the meantime. (I could always just roll my own client using aiohttp too since I'm using so little of the API, but figured it was worth asking about anyway.)

In case it helps, aiodns provides an example of supporting asyncio on Python 2.6+ (see https://github.com/saghul/aiodns#python-versions).

If boto3 was already written with the assumption of blocking rather than asynchronous I/O throughout, I'm not sure how disruptive a change this would be.

Jc2k commented 9 years ago

I'm not that familiar with the internals of botocore, but I think the right approach is to create a custom version of endpoint.py and client.py that used aiohttp under the hood and yielded as you want.

I would probably start by subclassing Endpoint and making an aiohttp version of _send_request and _get_request. I'd also subclass ClientCreator and override _create_api_method (that is where the calls to the endpoint come from). I haven't yet figured out how i'd get from a service object to a client created with my ClientCreator subclass.

Does that sound right, @jamesls? I might have a go...

rdbhost commented 9 years ago

I've started a port of botocore to asyncio, at:

https://github.com/rdbhost/botocore

So far, the S3 integration tests pass, as do about 2/3 of the unit tests. I hope to have all tests converted and passing by a week from now.

The port depends on yieldfrom.requests and yieldfrom.urllib3 , asyncio ports of the requests and urllib3 libraries.

jab commented 9 years ago

@rdbhost Awesome! Thanks for working on this. Definitely interested to follow your progress. @jamesls / botocore maintainers, had a chance to check this out / evaluate for merge potential?

koliyo commented 9 years ago

+1

AlexNigl commented 9 years ago

+1

rdbhost commented 9 years ago

This issue has gotten a couple of +1s in the last few days, so I thought I would point out that the work is done, at:

https://github.com/rdbhost/yieldfromBotocore

It implements an asyncio version of botocore, not boto3. I may eventually convert boto3, but my own needs have been satisfied by botocore, so other things now have priority.

David

On Thu, Jun 18, 2015 at 6:59 AM, AlexNigl notifications@github.com wrote:

+1

— Reply to this email directly or view it on GitHub https://github.com/boto/botocore/issues/458#issuecomment-113148024.

amatthies commented 9 years ago

+1

jmehnle commented 9 years ago

@rdbhost, thanks for tackling that! It seems, however, that https://github.com/rdbhost/yieldfromBotocore is 271 commits behind upstream. Are you actively maintaining it?

rdbhost commented 9 years ago

My intention has been to do a merge when upstream reached v1.0; that seems to have happened without my noticing.

I will be merging commits from upstream, this weekend.

r39132 commented 9 years ago

+1 @jamesls are you planning on pulling rdbhost's changes into botocore itself? We are very interested in this feature.

rdbhost commented 9 years ago

It is not a very promising candidate for merging back into botocore.

The changes are numerous, and the changes needed to make botocore functional within asyncio make it non-functional outside asyncio. I expect it to be a seperate product indefinitely, with a parallel API, meaning an API as similar as possible within the asyncio constraints.

jamesls commented 9 years ago

One thing that also complicates things is that botocore supports as far back as python 2.6.5, so we'll need to figure out how we can support asyncio and still maintain py2 support. I see that there are asyncio backports to python2, so perhaps something could be done there.

jmehnle commented 9 years ago

Trollius (asyncio port to Python 2.x) is complete and stable.

harai commented 9 years ago

+1

jettify commented 9 years ago

I have working port of botocore for asyncio: https://github.com/jettify/aiobotocore using aiohttp for async http requests. I am trying to reuse as much botocore code as possible, so I patched only several classes and just import rest of the code as result library has few hundreds lines of code. And this approach helps to keep up with upstream, but obvious downside, I rely on internal interfaces which is subject of change for new libs. API almost the same as botocore just yield from or awaite (python 3.5) should be added before calls.

For now I am using aiobotocore with s3 and ported almost all s3 test, except I need to work more on pagination since it is not easy to implement iterator protocol with yield from.

mpaolini commented 9 years ago

+1 integrating aiobotocore might be doable

balihoo-gens commented 9 years ago

+1

mikeplavsky commented 8 years ago

+1

cgst commented 8 years ago

+1

thomascellerier commented 8 years ago

+1

lsbardel commented 8 years ago

OK guys, I've written a production ready, and hopefully useful to the community, asyncio extension to botocore. The library is currently included in https://github.com/quantmind/pulsar-cloud in the asyncbotocore module but it is self-contained and could be striped out if needed.

The API is the same as botocore but with the addition of http_session keyword which is an optional asyncio compatible HTTP client (it must expose the _loop attribute) and API similar to requests. As far as I know there are two of such clients:

If used with pulsar, the library can also use greenlet in an implicit asynchronous fashion. Check https://github.com/quantmind/pulsar-cloud for more info

By the way, thanks to @jettify for the initial effort from which I leveraged from

Feedbacks welcome!

asvetlov commented 8 years ago

@lsbardel yes, aiohttp.ClientSession has _loop but I don't encourage using private attributes. BTW we use https://github.com/jettify/aiobotocore as aiohttp-based library.

lsbardel commented 8 years ago

@asvetlov cool, I understand the private attribute thing, but the _loop attribute should be (almost) a standard for asyncio objects ;-) maybe we should ask guido.

I would be happy to use https://github.com/jettify/aiobotocore has low level async botocore library but currently it does not work with pulsar http client. At the moment it clearly requires aiohttp while cloud.asyncbototore does not require an asyncio library as such.

So maybe we should converge to a library that allows for different http clients?

lsbardel commented 8 years ago

By the way all that is python 3.4 or above

asvetlov commented 8 years ago

@lsbardel At asyncio active development times Guido was against public loop attribute, he motivated it as "user always may pass explicit loop if needed". I don't think the decision has changed now.

nadad commented 8 years ago

+1

jmorris0x0 commented 8 years ago

+1

frnkvieira commented 8 years ago

+1

Nath-P commented 8 years ago

+1

vinayan3 commented 8 years ago

+1

smorin commented 8 years ago

+1

smorin commented 8 years ago

How much work do you think it would take to add this functionality to boto3?

jamesls commented 8 years ago

Boto3's interesting because of the resources layer which can make multiple API calls, e.g:

s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
    for key in bucket.objects.all():
        print(key.key)

What would the ideal async API look like for resources?

smorin commented 8 years ago

Ideally the implementation would allow for callbacks/extension points for people to integrate the non-blocking solution of their choice whether, gevent or asyncio etc.

It can be tricky to do expose that, is boto3 using requests or something like that under the hood?

For example, prompt_toolkit didn't originally have async support but added the ability to be used in a event loop.

Also here is an example:

Here is another example discussing: Async code design

rdbhost commented 8 years ago

Steve Morin:

If what you are wanting is a way to make requests to AWS from an asyncio app, using an API very similar to botocore, look at:

https://github.com/rdbhost/yieldfromBotocore

It is based on an asyncio port of requests:

https://github.com/rdbhost/yieldfromRequests

The yieldfromBotocore library is not maintained anymore, as the boto3 API library was evolving too fast to keep up with, and I have other priorities. It does work on my production site, though, and possibly others.

Simply adding asyncio support to botocore itself is probably not viable, as the architectural differences between the asyncio and traditional blocking approaches is too profound to address both in one code base.

On Mon, May 23, 2016 at 8:39 PM, Steve Morin notifications@github.com wrote:

Ideally the implementation would allow for callbacks/extension points for people to integrate the non-blocking solution of their choice whether, gevent or asyncio etc.

It can be tricky to do expose that, is boto3 using requests or something like that under the hood?

For example, prompt_toolkit didn't originally have async support but added the ability to be used in a event loop.

- http://python-prompt-toolkit.readthedocs.io/en/stable/pages/building_prompts.html#prompt-in-an-asyncio-application

Also here is an example:

Here is another example discussing: Async code design

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/boto/botocore/issues/458#issuecomment-221151568

jettify commented 8 years ago

@jamesls using python3.5 features api is pretty straitforward:

s3 = boto3.resource('s3')
async for bucket in s3.buckets.async_all():
    async for key in bucket.objects.async_all():
        print(key.key)
jettify commented 8 years ago

@smorin prompt toolkit is using threads for some blocking calls, in same fashion you can do with boto, just create big enough ThreadPoolExecutor and use it for each boto call that performs IO. This solution will work absolutely fine (with some exceptions of course).

smorin commented 8 years ago

I am looking to run this on AWS Lambda so trying to avoid threads for that reason and the performance penalty.

On Tue, May 24, 2016 at 2:03 PM, Nikolay Novik notifications@github.com wrote:

@smorin https://github.com/smorin prompt toolkit is using threads for some blocking calls, in same fashion you can do with boto, just create big enough ThreadPoolExecutor and use it for each boto call that performs IO. This solution will work absolutely fine (with some exceptions of course).

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/boto/botocore/issues/458#issuecomment-221391484

Steve Morin | Hacker, Entrepreneur, Startup Advisor twitter.com/SteveMorin | stevemorin.com Live the dream start a startup. Make the world ... a better place.

foobarna commented 8 years ago

+1

bmamouri commented 8 years ago

I'm downloading files from S3 in my lambda function and my lambda cost is extremely high because boto3 file download operation is blocking. I am really looking forward to implement this without asyncio that works in Python 2.7

JohnDzialo commented 8 years ago

+1

floostmodern commented 8 years ago

+1

graingert commented 8 years ago

twisted now supports asyncio. Twisted supports both Python 2 and Python 3.

If you ported botocore to twisted you could expose a blocking wrapper in boto3, and users could use botocore asynchronously via asyncio and twisted in Python 3 and just twisted in Python 2.

jettify commented 8 years ago

@graingert I do not think it is best approach.

Better approach is to move request preparation and response parsing logic out of IO library, as result anyone can add support for new framework regardless async or sync (https://sans-io.readthedocs.io/). botocore is doing good job there, I managed to hack asyncio support in aiobotocore very quickly and with little amount of code.

graingert commented 8 years ago

@jettify yeah that's an even better solution.

lsbardel commented 8 years ago

Updated pulsar-cloud to be compatible with botocore 1.4.61

Feedback welcome

atif1996 commented 7 years ago

+1

pfreixes commented 7 years ago

There is open a new thread at aiobotocore side [1] with a new proposal to align botocore and aiobotocore, we will like to get more opinions and especially the botocore maintainers such as @jamesls

[1] https://github.com/aio-libs/aiobotocore/issues/213

ravidbro commented 6 years ago

+1