Colin-b / httpx_auth

Authentication classes to be used with httpx
MIT License
114 stars 26 forks source link

thoughts on async implementation #72

Open rafalkrupinski opened 7 months ago

rafalkrupinski commented 7 months ago

Thank you @Colin-b and all contributors for this awesome package!

Just wanted to share some thoughts on possibility of async OAuth2 flow. I don't know all the corner cases this package handles, so please let me know whenever I write something naive.

  1. OAuth2 flows use a global cache with a (threading) lock. It's a problem for async code (https://github.com/Colin-b/httpx_auth/pull/48#issuecomment-1031194339), but also doesn't seem necessary. The token and lock could be simply instance variables. The only problem that that I can think of, is when user creates multiple instances with the same auth server and key. Is there a valid case for such usage?

  2. Actually the cache holds two locks, one for accessing cache, the other for refreshing the lock. Again this seem unnecessary:

    • when token is valid, the first concurrent call can acquire lock, check expiry and release it, while others wait for a short time.
    • when token is expired, the first concurrent call can acquire the lock, check expiry and refresh the token, while others have to wait until it finishes
  3. The OAuth2 flows use locks within .auth_flow() method, against the Auth documentation.

    If the authentication scheme does I/O such as disk access or network calls, or uses synchronization primitives such as locks, you should override .sync_auth_flow() and/or .async_auth_flow() instead of .auth_flow() to provide specialized implementations that will be used by Client and AsyncClient respectively.

    This is addressed in #48 patch. Having two locks makes sense when storage and refresh were split. In this case acquiring the refresh lock could be pulled to .a/sync_auth_flow().

  4. .auth_flow(), via .request_new_token() uses own instance of httpx.Client. Again, it doesn't seem necessary. Httpx already provides a/sync-portable protocol for making HTTP requests from Auth instances: response = yield request.

    If the auth server needs different transport options than the target server, mounts can be used. And if authentication is needed by the authentication server (meta-auth, client_auth param), the yielded requests need to be processed by meta-auth first.

  5. Since all OAuth2 implementations in this package follow the pattern:

    1. try getting cached token
    2. if expired, fallback to self.request_new_grant()
    3. set header

    steps 1+2 run with a lock, and 3 is very lightweight, the entire flow can run with a single lock.

When the above are addressed, supporting both sync and async should be as simple as implementing this class:

class LockingRefreshingAuth(Auth):
    def __init__(self):
        self._sync_lock = threading.Lock()
        self._async_lock = anyio.Lock()

    def sync_auth_flow(self, request: Request) -> Generator[Request, Response, None]:
        if self.requires_request_body:
            request.read()

        with self._sync_lock:
            # apply the loop over `self.auth_flow()`
            flow = self.auth_flow(request)
            ...

    async def async_auth_flow(self, request: Request) -> AsyncGenerator[Request, Response]:
        # like above, but async

Implementations would still yield their auth requests instead of directly using Client, and yield the user request before returning.


I've forked the repo to work on a POC. If it makes sense, and it's welcome, I'll make a PR. Meanwhile, comments are more than welcome.


edit 1: I've just noticed that auth token refresh requests also need to have authentication headers, and I guess that's why a client is used. But I still think it's unnecessary, and simply another Auth instance can be used

edit 2: It makes sense to have two locks if one lock protects a single Auth instance and the other the global cache.

Colin-b commented 7 months ago

Hello @rafalkrupinski I am currently looking at finally introducing async coverage in the test suite. It requires some refactoring of the test suite, I'll keep you posted, once it's done I will consider reviewing your suggestions and the PR that is opened for quite some time already on this.

Thanks a lot for your patience

rafalkrupinski commented 7 months ago

I've just pushed a commit that removes the inner Client from Auth classes - https://github.com/python-lapidary/httpx_auth_async/tree/feature/async. If you like the direction I can separate the flows to sync and async versions.

edit: The implementation itroduces meta-auth - using Auth object to authenticate auth flow requests. In other words, when Auth object needs to make authenticated request to the auth server, another auth object is used.

Because of that, and because auth flows need to read responses (which may be either sync or async), there isn't that much of implementation to share between a/sync versions.

It's possible that I know OAuth2 too little, and a simpler implementation is possible, e.g. if only HTTP Basic authentication is ever used, it could be simply done in the OAuth class, without using meta-auth.

Cheers

Colin-b commented 7 months ago

The test suite is now ensuring non regression on async flows as well. Feel free to submit small scale focused PRs and I will gladly review.

The client is a parameter that is necessary because some users might have rules that requires specific proxy settings, certificates or whatever and this is not the goal of this package to reproduce every feature httpx offers to cover those cases, so we delegate to the user, the responsibility to provide a properly configured client in case they need to.

rafalkrupinski commented 7 months ago

Is it something mounts can't handle?

Colin-b commented 7 months ago

I am not sure to follow, if the client has custom rules to achieve auth and an additional set of rules to reach the authenticated endpoint, how would you do it without providing N parameters to every auth class that would need to be forwarded to the inner client of the auth class? Let's say your certificate differs and your proxy differs (mounts can cover proxy on client side, but not certificate). And this is only an example I can think of, I am sure we would have to cover a whole lot of use cases, corporate networks are a nightmare ;)

rafalkrupinski commented 7 months ago

if the client has custom rules to achieve auth and an additional set of rules to reach the authenticated endpoint, how would you do it without providing N parameters to every auth

How large is N? Cookies, headers, query params, body... perhaps a single parameter request_args: dict? Proxy and cert should be handled by mounts...

mounts can cover proxy on client side, but not certificate

This doesn't work?

httpx.AsyncClient(mounts={'https://example.com', httpx.AsyncHTTPTransport(verify=ssl.SSLContext(...) or cert=...)})

And this is only an example I can think of, I am sure we would have to cover a whole lot of use cases, corporate networks are a nightmare ;)

Would be a great help if I knew some corner cases.

Colin-b commented 7 months ago

You are right, I guess it's possible to do everything via mounts, though you would admit it's not as straightforward to provide for a client. Anyway I will gladly review your proposal on async. If you could come up with the smallest possible enhancements that would help the review and increase the chances for it to get merged.

Thanks again !