encode / httpx

A next generation HTTP client for Python. 🦋
https://www.python-httpx.org/
BSD 3-Clause "New" or "Revised" License
13.03k stars 829 forks source link

Supporting Sync. Done right. #572

Closed tomchristie closed 4 years ago

tomchristie commented 4 years ago

So, with the 0.8 release we dropped our sync interface completely.

The approach of bridging a sync interface onto an underlying async one, had some issues, in particular:

The good news is that I think the unasync approach that the hip team are working with is actually a much better tack onto this, and is something that I think httpx can adopt.

We'll need to decide:

(Essentially the same discussion as https://github.com/python-trio/hip/issues/168)

One option here could be...

So eg...

Sync:

client = httpx.sync.Client()
client.get(...)

Async:

client = httpx.async.Client()
await client.get(...)

Update

Here's how the naming/scoping could look...

Top level:

response = httpx.get(...)
response = httpx.post(...)
response = httpx.request(...)
with httpx.stream(...) as response:
    ...
response = await httpx.aget(...)
response = await httpx.apost(...)
response = await httpx.arequest(...)
async with httpx.astream(...) as response:
    ...

Client usage:

client = httpx.SyncClient()
client = httpx.AsyncClient()

(Unlike our previous approach there's no Client case there, and no subclassing. Just a pure async implementation, and a pure sync implementation.)

We can just have a single Request class, that'd accept whatever types on init and expose both streaming and async streaming interfaces to be used by the dispatcher, that raises errors in invalid cases eg. the sync streaming case is called, but it was passed an async iterable for body data.

Most operations would return a plain ol' Response class, with no I/O available on it. See https://github.com/encode/httpx/issues/588#issuecomment-562057229 When using .stream, either an AsyncResponse or a SyncResponse would be returned.

There'd be sync and async variants of the dispatcher interface and implementations, but that wouldn't be part of the 1.0 API spec. Neither would the backend API.

tomchristie commented 4 years ago

Ah, actually I guess that might not quite be okay, given that async is a keyword?

Probably workarounds available, but could be problematic.

Anyway's there's:

florimondmanca commented 4 years ago

IMO sync should be the second class citizen, not async. Not sticking to this means going backwards wrt discussions and conclusions we’ve drawn from #522.

So, another option: keep the same API, but have it live under httpx.sync.

In short...

>>> import httpx.sync as httpx
phillmac commented 4 years ago

I get that this is not stable software, but this was a bit of a rude shock when all my tests broke.

I completely disagree that "sync should be the second class citizen". Imho 90% of use cases will be fetch this page then fetch the next and so on... Having to shoehorn strictly synchronous processes to async seems to be a bit of a pain in the backside to me.

tomchristie commented 4 years ago

@phillmac I hear you competely. We've had a big pain point here in that our prior approach to sync support needed to be ripped out in order to change how we handle it.

Having cleared that out the way, I now absolutely want to support the sync and async cases both equally. We've got a bit of technical stuff to do to get that done, and we've also got the naming decisions here to resolve, but yes it'll be on the way back in.

I suggest pinning your httpx dependency to 0.7.x, and waiting a couple of weeks to see where we're at on this.

phillmac commented 4 years ago

Cool, my fault for not pinning it earlier, but I wanted bleeding edge stuff. Guess I got what I asked for

tomchristie commented 4 years ago

I should be slightly more precise here - one thing that may well differ is that our sync API may not support HTTP/2. (I see you mention that in the thread in #522.)

phillmac commented 4 years ago

Yeah that's the whole point, I have a http/2 api server, there seems not to be any just bog standard http/2 libraries 😞

lundberg commented 4 years ago

I've been playing around with this a little now...took an other approach. Might be too magic though, but wanted to throw it out for discussion anyways ;-)

Idea is to use the same high-level api for both sync and async by reverting the api methods get(), post() etc back to regular sync variants, but instead returning the coroutine from request().

On top of that they could be decorated with a new decorator that evaluates if we're in an async context or not. If we're in async context, just return the coroutine which is awaitable by the user, but for sync context, help the user by running the coroutine in a new, or reusable?, asyncio loop.

Tested example:

# api.py

def awaitable(func: Callable) -> Callable:
    def wrapper(
        *args: Any, **kwargs: Any
    ) -> Union[Response, Coroutine[Any, Any, Response]]:
        coroutine = func(*args, **kwargs)
        try:
            # Detect if we're in async context
            sniffio.current_async_library()
        except sniffio.AsyncLibraryNotFoundError:
            # Not in async context, run coroutine in a new event loop.
            return asyncio.run(coroutine)
        else:
            # We're in async context, return the coroutine for await usage
            return coroutine

    return wrapper

async def request(...) -> Response:
    async with Client(...) as client:
        return await client.request(...)

@awaitable
def get(...) -> Union[Response, Coroutine[Any, Any, Response]]:
    return request(...)

# test_api.py
def test_sync_get():
    response = httpx.get(...)

@pytest.mark.asyncio
def test_async_get():
    response = await httpx.get(...)

Pros are that it doen't change the current api, meaning it's still awaitable coroutines returned that can be used with asyncio.gather etc. Cons are that it gives a sad extra context-check overhead, if not solved some way, cache?, and also maybe harder to document.

To get rid of the context check, a setting could be introduced instead, i.e. httpx.sync = True

Thoughts?

StephenBrown2 commented 4 years ago

Aside from asyncio.run() being Py3.7+ only, and my needing to run httpx on 3.6, that looks quite nifty.

lundberg commented 4 years ago

Aside from asyncio.run() being Py3.7+ only, and my needing to run httpx on 3.6, that looks quite nifty.

True, the asyncio.run() should probably be more clever and also reuse the created helper loop.

tomchristie commented 4 years ago

It's a neat approach. Potentially not great for folks using mypy or other type checkers against their codebase.

lundberg commented 4 years ago

@tomchristie do you have any experience/thoughts on sniffio.current_async_library() giving us too much (unnecessary) overhead for the "primary" async use case?

tomchristie commented 4 years ago

@lundberg Not something I've looked into, no. Any particular indication that might be an issue?

tomchristie commented 4 years ago

I've updated the issue description with my thoughts on how the naming/scoping could look here.

lundberg commented 4 years ago

Any particular indication that might be an issue?

No other than that it feels wrong that I had to do the check on every api call.

Potentially not great for folks using mypy or other type checkers against their codebase.

We could supply a typing shorthand for this, not as nice as Response, but still.

florimondmanca commented 4 years ago

Potentially not great for folks using mypy or other type checkers against their codebase.

Can confirm: mypy would disallow awaiting the return value of @awaitable-decorated functions because all types in the Union aren't awaitable. Also, I had to go to great lengths to make the decorator please mypy… 😄

Full snippet:

import asyncio
import typing

import httpx
import sniffio

T = typing.TypeVar("T")

def awaitable(
    func: typing.Callable[..., typing.Union[T, typing.Awaitable[T]]]
) -> typing.Callable[..., typing.Union[T, typing.Awaitable[T]]]:
    def wrapper(*args: typing.Any, **kwargs: typing.Any) -> typing.Union[T, typing.Awaitable[T]]:
        coroutine = func(*args, **kwargs)
        try:
            # Detect if we're in async context
            sniffio.current_async_library()
        except sniffio.AsyncLibraryNotFoundError:
            # Not in async context, run coroutine in a new event loop.
            coroutine = typing.cast(typing.Awaitable[T], coroutine)
            return asyncio.run(coroutine)
        else:
            # We're in async context, return the coroutine for await usage
            return coroutine

    return wrapper

async def request(method: str, url: str) -> httpx.Response:
    async with httpx.Client() as client:
        return await client.request(method, url)

@awaitable
def get(url: str) -> typing.Union[httpx.Response, typing.Coroutine[typing.Any, typing.Any, httpx.Response]]:
    return request('GET', url)

async def main() -> None:
    _ = await get('https://example.org')
$ mypy example.py
_debug/app.py:42: error: Incompatible types in "await" (actual type "Union[Response, Awaitable[Response]]", expected type "Awaitable[Any]")

While this is smart, I think the implicitness induced by having to look at the context around the function call makes it a no-no for me. On the contrary, an API such as .get()/.aget() makes it easy to know at-a-glance whether a coroutine or a plain value is returned (while pleasing type checkers more easily).

florimondmanca commented 4 years ago

@tomchristie A few questions regarding the update in the issue description.

A) Do we want AsyncClient to have .get(), .post(), etc, or to match the top-level API with .aget(), .apost(), etc? (I'm inclined towards the latter, for consistency and also knowing at a glance that we're using an async client without having to call it async_client or analyze the surrounding context.)

B) Re: responses, as I understand Response would be the default, won't-do-any-I/O, simple container response class, while SyncResponse would be for sync-streaming responses, and AsyncResponse would be for async-streaming responses, correct?

I'm thinking it might be possible to take advantage of the fact that with cmand async with acm are two distinct protocols, with their own pair of dunder methods. We could have:

class StreamingResponse:
    def __enter__(self):
        assert is_sync_iterator(self.content)
        ...

    def __exit__(self, *args):
        ...

    def __aenter__(self):
        assert is_async_iterator(self.content)
        ...

    def __aexit__(self, *args):
        ...

This way:

And we don't have to create sync/async classes just for the streaming case. :-)

(Caveat: we'd need to do the same kind of checks for the .stream_*()/.astream_*() methods.)

lundberg commented 4 years ago

Also, I had to go to great lengths to make the decorator please mypy...

Nice try though @florimondmanca 😅

Also really agree that it's more clear with separate sync/async apis like get/aget etc. But even though my rough idea, read think outside the box, probably is not the way to go, it does keep the core code base untouched, in other words no need for sync/async implementations like before.

Still think it would be nice to keep the core async, and "add" sync support/help on top. Maybe just so, a sync-wrapper api, maybe even go all the way with opt-in install pip install httpx[sync] 😉

florimondmanca commented 4 years ago

There's still the "code-gen the sync version from the async one" approach that we've been thinking about seeing what folks from hip are doing… It's not 100% obvious to me it's still the best approach, though. 💭

tomchristie commented 4 years ago

Do we want AsyncClient to have .get(), .post(), etc, or to match the top-level API with .aget(), .apost(), etc?

Ah, that’s a good point. I’d not considered that those two wouldn’t mirror each other. At that point I’d probably just say “let’s not have the top level API for async.” Different constraints - the top level API is a convenience case, and there’s really no good reason not to always use a client instance in the async case.

And yeah neat point about the StreamingContext being able to handle both the async and sync cases in the same implementation. 👍

lundberg commented 4 years ago

Couldn't let go of my attempt of having one high-level api... 😄

New idea is to skip the sync/async context checking and instead wrap the coroutine from request() in a FutureResponse, allowing proper await, but also adds a sync send() helper.

Example:

# models.py
class FutureResponse(Awaitable[Response]):
    def __init__(self, coroutine):
        self.coroutine = coroutine

    def __await__(self):
        return self.coroutine.__await__()

    def send(self):
        # Alternative names; result, run, resolve, call, __call__, ?
        # Should be more clever here and reuse sync helper loop and support py3.6
        return asyncio.run(self.coroutine)
# api.py
async def request(...) -> Response:
    async with Client(...) as client:
        return await client.request(...)

def get(...) -> FutureResponse:
    return FutureResponse(request(...))
# test_api.py
@pytest.mark.asyncio
async def test_async_get():
    response = await httpx.get(...)

def test_sync_get():
    response = httpx.get(...).send()

Thoughts?

florimondmanca commented 4 years ago

That is outrageously clever, @lundberg. 🤯

On trio you'd have to some a weird trio.run(lambda: self.coroutine) (since trio.run() expects a callable that returns a Coroutine), but yeah, it's a fun one!

lundberg commented 4 years ago

On trio you'd have to some a weird trio.run(lambda: self.coroutine) (since trio.run() expects a callable that returns a Coroutine), but yeah, it's a fun one!

Do you mean that FutureResponse.send() also would need to support trio @florimondmanca ?

If so, I personally think that it doesn't need to be that clever, since if you have trio installed, you're probably already in an async context, or know what you're doing.

Additionally, about the parallel api namespace approach, .get() / .aget(). If thats the road, I think they should be named carefully to favour async, meaning if we drop, or stuff away, sync support in the future. Personally, I'd like to see httpx.get() to be the async one, to allow simpler drop of sync api's, and leave the async ones untouched, making upgrade HTTPX a lot easier for async users. In the future, everything will be async, right? 😉

tomchristie commented 4 years ago

I'mma have to curtail the fun here I think. 😅

Let's not be clever.

@florimondmanca made a good point about aget() etc not mirroring the client methods, so I think we'll just keep things really simple. Standard ol' thread-concurrency API for the top level convienence case, great with the REPL etc.

The SyncClient and AsyncClient for when you're actually being precise about things. There's no good enough reason to support async without an explicit client instance.

florimondmanca commented 4 years ago

@tomchristie So, about the SyncClient/AsyncClient distinction — as I understand, we're going towards dropping the code-gen approach here, correct? Or is this only a discussion about API for now, and we'll see how to bring that API to life later?

florimondmanca commented 4 years ago

Also, I'm not too wild on enforcing the usage of a client in the async case.

One of the killer features of Requests is the ability to just import requests and .get() your way through things. This is far enough for most use cases. Yes, it sets up a connection pool and tears it down every time (doesn't it?), but most people don't care that much about performance (and when they do, they can switch to a client).

I'd really like to be able to just import httpx about await httpx.get() (or .aget() (?)).

In short, I'm in favor of an async top-level API with the rationale that the top-level API is not only convenient for the REPL use cases.

imbolc commented 4 years ago

import httpx.sync as httpx

Why should they be in the same package then? Won't it be easier to have a separate httpx_sync package and just keep compatibility on API level. This may reveal that most of the people use only one of them. E.g. if there would be a lack of resources to maintain both, it would be easier to decide which one to drop. The common staff could obviously go into a separate httpx_base package.

imbolc commented 4 years ago

I completely disagree that "sync should be the second class citizen". Imho 90% of use cases will be fetch this page then fetch the next and so on...

@phillmac I'd even say its 99% and await keyword here is exactly for this purpose :)

imbolc commented 4 years ago

@lundberg with the @awaitable api:

  1. Can't there be some edge cases you'd intentionally want to run stuff synchronously from an async context?
  2. Can't we go a bit further and resolve the coroutine inside awaitable instead of returning it? This way api would be completely the same, without even a need to add the await keyword. It will also eliminate these silent errors if we forget to add await. We still can get an error if we add an unnecessary await, but it won't be silent anymore.
florimondmanca commented 4 years ago

So, regarding the implementation of a sync variant, I guess one thing we agree on at this point is that the sync version must be purely sync, right? So, no asyncio.run() calls, just an end-to-end, sync-only, socket-based impementation.

While there's the question of how to syncify the Request and Response interfaces (we've already gathered some hints about that in here), I'm thinking about what should happen regarding the dispatchers, esp. ConnectionPool and HTTPConnection, as well as the Client.

Option 1) is to re-implement everything in a sync variant, so SyncClient, SyncDispatcher base class, SyncConnectionPool, SyncHTTPConnection, etc. Obviously that's a lot of code, so that's when we get to talk about option 2): code generation; i.e. generating the sync variant from the async one.

I had some fun today putting together a sample package that uses unasync to code-gen a sync version of itself: hello-unasync.

The main goal was to see how the development experience looked like, and how to set things up to get mypy, flake8, and code coverage to work with code generation.

Getting code coverage to work locally and on CI was a bit involved, but overall besides the compilation step there isn't much that changes.

I must say the development experience is less pleasing/straight-forward than for a simple "it's all in the source tree" package, but having the sync version "maintain itself" is really handy.

Wondering what applying this approach to HTTPX would involve in practice. My thoughts are this would require the following…

  1. Move async code that should be code-gen'd to an _async sub-package. So this means Request, Response, Client, dispatchers, middleware, the high-level API, and the ConcurrencyBackend base class (so that we can code-gen a sync variant for SyncBackend to use).
  2. Add a SyncBackend (based on #525).
  3. Update auto-detection in lookup_backend to detect when we're in the _sync part of the code and automatically use SyncBackend in that case.
  4. Update our toolchain.
  5. Profit?

That's quite a lot of upfront work, but it's not clear to me yet which of 1) re-implementing sync from async or 2) setting up code generation would be the best. I guess both have their pro's and con's…

Is there anything we can do/try to see whether codegen would fit us well? Maybe only tackle syncifying HTTPConnection? (Which would already require syncifying HTTP11Connectionmaybe HTTP2Connection can stay as-is for now.)

lundberg commented 4 years ago
  1. Can't there be some edge cases you'd intentionally want to run stuff synchronously from an async context?

I guess there might be, i.e. a 3rd-p-package limiting to a regular def @imbolc. The sync send() helper would need to handle catching existing loop and reuse that one.

  1. Can't we go a bit further and resolve the coroutine inside awaitable instead of returning it?

IMO, that would limit async features, i.e. gather etc, or passing the coroutine around before awaiting.

lundberg commented 4 years ago

I had some fun today putting together a sample package that uses unasync to code-gen a sync version of itself: hello-unasync.

Nice to see this alternative as well @florimondmanca 😅

As we know, a lot of libraries and frameworks are struggling with the same stuff, and tackling it in different ways. Django released 3.0, ~ a week ago, with asgi- and some async support. Until Django is fully async, their solution is async_to_sync.

florimondmanca commented 4 years ago

Until Django is fully async, their solution is async_to_sync.

Yup. Though I think this is the kind of "bridging" approach we had been trying to use here before (only via a custom implementation), i.e. running wrapping an async operation in an asyncio.run() call, with extra care not to run an event loop in a thread where there's one running. It's brought us a wealth of issues (see the beginning of this issue description), and I believe we don't want to go that path anymore.

lundberg commented 4 years ago

... and I believe we don't want to go that path anymore.

Totally agree @florimondmanca. Just wanted to highlight the alternative.

But having said that, it's still a nice way of keeping the code base tight, either solution and alternative used, to keep the core async and "just" wrap it for sync use.

imbolc commented 4 years ago

one thing that may well differ is that our sync API may not support HTTP/2

Is there any reason for bringing back the sync version at all If it would be the case? How would it differ from the requests?

tomchristie commented 4 years ago

Actually turns out that I think our Sync variant will support HTTP/2. (After our latest refactoring we don't actually need background tasks in the HTTP/2 case.)

How would it differ from the requests?