feat: integration authz headers in Dirac client

aldbr commented 1 year ago

This PR aims to enhance the autorest Dirac client by encapsulating the configuration options such as the authorization headers. The primary purpose is to simplify code for the cli developers.

writing a new cli command now:

async with Dirac(endpoint=get_diracx_preferences().url) as api:
jobs = await api.jobs.search(
    parameters=None if all else parameter,
    search=condition if condition else None,
    headers=get_auth_headers(),
)

with this PR:

async with Dirac() as api:
    jobs = await api.jobs.search(
        parameters=None if all else parameter,
        search=condition if condition else None,
    )

Note: the Dirac client is able to auto refresh the access token.

Chosen solution

The solution is based on the azure.core architecture. We redefine 3 classes:

Dirac: to encapsulate the endpoint and an authentication policy
AsyncTokenCredential: provide OAuth tokens
AsyncBearerTokenCredentialPolicy: an authentication policy which aims at adding the authorization headers to the requests

This solution was chosen after thorough research and analysis, as it was deemed (by myself) the simplest way to incorporate authorization headers into requests.

Considered alternatives

`azure-identity`

The python autorest documentation recommends using a credential type from the azure identity package to initialize the client.

The DeviceCodeCredentials seems adapted to our use case. It is able to obtain a token using the device_code flow and cache it. Then it is automatically refreshed when needed.

Problem: it is tightly coupled with Azure applications. DeviceCodeCredential inherits from InteractiveCredential, which inherits from MsalCredential. There are mandatory non-standard parameters such as tenant_id. Setting it to "" or None does not help. The only option would have been to create new classes inheriting from the mentioned classes, which would have been inefficient.

`AzureAD` authentication library

microsoft-authentication-library-for-python seems to provide a generic OAuth2 library but does not automatically refresh access tokens. Furthermore, it would probably be tricky to integrate to the Dirac autorest client.

`authlib` in combination with `azure.core`

authlib would help managing tokens. It is even able to automatically refresh access tokens. But the library would be limited in our context as the Dirac client is initialized each time a new command starts.

chrisburr commented 1 year ago

I think the general approach looks good and like the right way to go 👍

I'll do a specific review once you figure out the CI.

aldbr commented 1 year ago

mypy was tough but probably fair.

Here are the changes I performed to make mypy happy:

solved src/diracx/cli/__init__.py:32: error: "Dirac" has no attribute "login"
- renamed the patched Dirac client as DiracClient
- copied Dirac().__aenter__() in DiracClient()
solved src/diracx/client/aio/_patch.py:51: error: Signature of "get_token" incompatible with supertype "AsyncTokenCredential" [override]
- copied AsyncTokenCredential().close()/__aenter__()/__aexit__() in DiracAsyncTokenCredential()
- added suggested signature to DiracAsyncTokenCredential().__aexit__() with ... as default values

It is just for info in case there are better solutions.

The issue related to DIRAC Integration tests is expected, I am going to make a PR once we are okay with the structure of this one.

Update: mypy now complains about the redefinition of the DiracBearerTokenCredentialPolicy()._token() but I don't see how I could do this differently since I can't modify self._token.expires_on which a read-only attribute.

aldbr commented 1 year ago

The issue with the integration tests is related to the normal (non-aio) Dirac client, which is used by DIRAC, and which is actually not implemented here. The easiest option would be to almost duplicate the content of the aio Dirac client in diracx/client/_patch.py. I can see if there is an easy way to avoid duplicating too much code, I will work on that asap.

chrisburr commented 1 year ago

To get the CI green again I've pushed a workaround to main, this commit should be reverted: 2c70a6787

aldbr commented 1 year ago

I'm adding a comment to explain the latest changes because it is becoming hard to review and I am sorry for that:

I added a sync DiracClient version in src/diracx/client/_patch.py. I created a few methods to refactor codes used by both sync/async clients.
I had to make another PR on DIRAC to take into account a few changes I've made here: https://github.com/DIRACGrid/DIRAC/pull/7205
- While doing this, I took the opportunity to remove the state parameter from the TokenResponse pydantic model because it is not part of the OAuth2 RFC and it is currently not used. I had to regenerate the client to make mypy happy but it generated a large number of changes in the client files.

DIRACGrid / diracx