Colin-b / httpx_auth

Authentication classes to be used with httpx
MIT License
115 stars 26 forks source link

OAuth2ClientCredentials client id and secret are not taken into account to distinguish tokens #97

Open LeonardHd opened 6 days ago

LeonardHd commented 6 days ago

Hi @Colin-b thanks for this library - it helps people implementing OAuth flows which is great! I think we might have discovered an issue that causes unintended or unexpected token reuse across requests/auth objects.

The good news is we found an option that prevents it with the current version (see Considered options to mitigate the issue) but that might not be reasonable defaults (at least for the OAuth2ClientCredentials) or not very obvious for users.

Let me know what you think and what you think about potential solutions to the problem.

Problem

When using the auth parameter (for example with OAuth2ClientCredentials) to authenticate with a service, the global token cache is shared across all auth objects for the same token urls. This can lead to unintended token reuse across different auth objects, for example, when using Microsoft Entra Id (i.e., Azure AD) to authenticate with different services via OAuth2.

This could cause security issues because (not exhaustive):

Above behaviour might not be expected by the user and could lead to security issues.

Sample to reproduce

I have written a few pytest test cases to demonstrate the issue.

# token_reuse_example.py

"""Example of unexpected token reuse in httpx_auth.

How to reproduce:
1. Install the required dependencies:
    $ pip install httpx pytest httpx_auth
2. Create a .env file with the following content:
    # Use a OAUTH protected endpoint or webhook.io to check what Bearer is used.
    OAUTH_PROTECTED_ENDPOINT="https://graph.microsoft.com/v1.0/servicePrincipals/{REPLACE_WITH_OBJECT_ID_OF_ENTERPRISE_APP}" (or any other OAuth2 protected endpoint)
    CLIENT_SECRET="XXXXXXXX"
    CLIENT_ID="XXXXXXXX"
    # Replace with a tenant id
    TOKEN_URL="https://login.microsoftonline.com/XXXXXXXX-XXXX-XXXX-XXXX-XXX#XXXXXXXXX/oauth2/v2.0/token"
    # Optional: Add a second client_id and client_secret to test with multiple clients
    CLIENT_2_ID="<CLIENT_2_ID>"
    CLIENT_2_SECRET="<CLIENT_2_SECRET>"
3. Run the following commands:
    $ source .env && export $(cat .env | xargs)

4. Run the test:
    $ pytest token_reuse_example.py
"""

import os
from hashlib import sha512

import httpx
import pytest
from httpx_auth import OAuth2ClientCredentials
from httpx_auth._errors import InvalidGrantRequest
from httpx_auth._oauth2.common import OAuth2

OAUTH_PROTECTED_EP = os.getenv("OAUTH_PROTECTED_ENDPOINT")
CLIENT_SECRET = os.getenv("CLIENT_SECRET")
CLIENT_ID = os.getenv("CLIENT_ID")
CLIENT_2_ID = os.getenv("CLIENT_2_ID", "<CLIENT_2_ID>")
CLIENT_2_SECRET = os.getenv("CLIENT_2_SECRET", "<CLIENT_2_SECRET>")
TOKEN_URL = os.getenv("TOKEN_URL")

def test_current_behaviour():
    """Test to demonstrate current behaviour.

    The token is reused between requests and auth objects
    even if client_id and client_secret are different.
    """

    OAuth2.token_cache.clear()  # Ensure the cache is empty (due to the different test cases)

    with httpx.Client() as client:
        auth_c1 = OAuth2ClientCredentials(
            TOKEN_URL,
            client_id=CLIENT_ID,
            client_secret=CLIENT_SECRET,
            scope=".default",
        )

        response = client.get(OAUTH_PROTECTED_EP, auth=auth_c1)
        assert response.status_code == 200

    with httpx.Client() as client:
        auth_c1 = OAuth2ClientCredentials(
            TOKEN_URL,
            client_id="XXXXXXX",
            client_secret=CLIENT_SECRET,
            scope=".default",
        )

        response = client.get(OAUTH_PROTECTED_EP, auth=auth_c1)
        assert response.status_code == 200

    with httpx.Client() as client:
        auth_c1 = OAuth2ClientCredentials(
            TOKEN_URL, client_id=CLIENT_ID, client_secret="XXXXXXXXXX", scope=".default"
        )

        response = client.get(OAUTH_PROTECTED_EP, auth=auth_c1)
        assert response.status_code == 200

def test_current_behaviour_multiple_clients():
    """Test to demonstrate current behaviour with multiple clients.

    If the same TOKEN_URL is used, the token is reused between requests and auth objects.

    If CLIENT_2_ID and CLIENT_2_SECRET are provided, the test will use a different client_id and client_secret.
    Yet, the token fetched with `auth_c1` is reused between requests and auth objects.

    If CLIENT_2_ID and CLIENT_2_SECRET are not provided, the test will use default values that are invalid.
    Yet, the token from the first request is reused for the second request.
    """
    OAuth2.token_cache.clear()  # Ensure the cache is empty (due to the different test cases)

    with httpx.Client() as client_1, httpx.Client() as client_2:
        auth_c1 = OAuth2ClientCredentials(
            TOKEN_URL,
            client_id=CLIENT_ID,
            client_secret=CLIENT_SECRET,
            scope=".default",
        )
        auth_c2 = OAuth2ClientCredentials(
            TOKEN_URL,
            client_id=CLIENT_2_ID,
            client_secret=CLIENT_2_SECRET,
            scope=".default",
        )

        response = client_1.get(OAUTH_PROTECTED_EP, auth=auth_c1)
        assert response.status_code == 200

        response = client_2.get(OAUTH_PROTECTED_EP, auth=auth_c2)
        assert response.status_code == 200

def test_avoid_shared_cache_for_current_behaviour():
    """Test to demonstrate a workaround for the current behaviour.

    As the TokenCache uses the `self.state` instance variable to compute the cache key a
    current workaround is to use pass an additional kwarg parameter to the OAuth2ClientCredentials.

    We build a hash from the client_id and client_secret and pass it as as the `_client_id_secret_hash`
    argument to enforce the cache key to be different for different client_id and client_secret combinations.
    """

    OAuth2.token_cache.clear()  # Ensure the cache is empty (due to the different test cases)

    with httpx.Client() as client:
        _client_id_secret_hash = build_client_id_secret_hash(CLIENT_ID, CLIENT_SECRET)
        auth = OAuth2ClientCredentials(
            TOKEN_URL,
            client_id=CLIENT_ID,
            client_secret=CLIENT_SECRET,
            scope=".default",
            _client_id_secret_hash=_client_id_secret_hash,
        )

        response = client.get(OAUTH_PROTECTED_EP, auth=auth)
        assert response.status_code == 200

    with httpx.Client() as client:
        _client_id_secret_hash = build_client_id_secret_hash(CLIENT_ID, "XXXXXXXXXX")
        auth = OAuth2ClientCredentials(
            TOKEN_URL,
            client_id=CLIENT_ID,
            client_secret="XXXXXXXXXX",
            scope=".default",
            _client_id_secret_hash=_client_id_secret_hash,
        )

        with pytest.raises(InvalidGrantRequest):
            client.get(OAUTH_PROTECTED_EP, auth=auth)

    with httpx.Client() as client:
        _client_id_secret_hash = build_client_id_secret_hash("XXXXXXX", CLIENT_SECRET)
        auth = OAuth2ClientCredentials(
            TOKEN_URL,
            client_id="XXXXXXX",
            client_secret=CLIENT_SECRET,
            scope=".default",
            _client_id_secret_hash=_client_id_secret_hash,
        )

        with pytest.raises(InvalidGrantRequest):
            client.get(OAUTH_PROTECTED_EP, auth=auth)

def build_client_id_secret_hash(client_id: str, client_secret: str) -> str:
    return sha512(f"{client_id}{client_secret}".encode("unicode_escape")).hexdigest()

Considered options to mitigate the issue

I propose the following options to mitigate the issue. Please note that the following might not be exhaustive and there might be other ways to mitigate the issue.

  1. Warn the user about the global token cache and the cache key implementation and the potential issues it might cause more explicitly in the documentation and the code (e.g., in the docstring of the OAuth2ClientCredentials class). Point them to passing a hash like we did in the sample with build_client_id_secret_hash.

  2. Tie the token cache to the auth object. This would mean that the token cache is not shared across different auth objects.

    class OAuth2BaseAuth(abc.ABC, httpx.Auth):
        # ...
    
        def auth_flow(
            self, request: httpx.Request
        ) -> Generator[httpx.Request, httpx.Response, None]:
            # change to (by instantiating the token_cache in the __init__ method):
            token = self.token_cache.get_token(
                self.state,
                early_expiry=self.early_expiry,
                on_missing_token=self.request_new_token,
                on_expired_token=self.refresh_token,
            )
            # instead of:
            # token = OAuth2.token_cache.get_token(
            #     self.state,
            #     early_expiry=self.early_expiry,
            #     on_missing_token=self.request_new_token,
            #     on_expired_token=self.refresh_token,
            # )
            self._update_user_request(request, token)
            yield request
        # ...
Colin-b commented 6 days ago

Hi,

Thanks for reporting. There is indeed an issue with this specific flow when only client id and client secret differs and the workaround is to send additional useless parameters to the URL.

The global cache is on purpose and the fix will be to add credentials to the key in some way, the same way it is done in some other auths for instance.

Given that there is a workaround I don't see myself handling this in the near future but this is a valid issue in the backlog !

asanglard commented 4 days ago

Hi,

Thanks for reporting. There is indeed an issue with this specific flow when only client id and client secret differs and the workaround is to send additional useless parameters to the URL.

The global cache is on purpose and the fix will be to add credentials to the key in some way, the same way it is done in some other auths for instance.

Given that there is a workaround I don't see myself handling this in the near future but this is a valid issue in the backlog !

Hi,

IMO, this issue is quite critical in term of security, let me explain my reasoning.

If someone uses the library with two differents client_id/client_secrent with the same token url, the current implementation can be highly misleading as it brings unexpected behaviour and it could allow to fetch an unexpected token from an other "client/session". It can have a strong impact on a business.

Taking into consideration that the current design of the cache does not support usage of different client_id, client_secret and it has a high security concern, I think this issue should be a top priority.

Concerning the workaround, it is not super clean... The endpoint of the token url may reject additional query param. It will depend on the flexibility of the implementation of the endpoint. It can work at one moment, but no guarantee on long term.