crashvb / docker-registry-client-async

An AIOHTTP based Python REST client for the Docker Registry.
Apache License 2.0
5 stars 6 forks source link

Drop token_based_endpoints #26

Closed skrech closed 2 years ago

skrech commented 2 years ago

Hello again crashvb,

I'm writing this to suggest that token_based_endpoints config should be dropped and Token Authentication should be the default.

To support this suggestion I'll submit some facts as of today:

  1. Some registries don't support Basic Authentication directly at all -- such as Docker.io and Quay.io. By directly, I mean to supply credentials to the registry itself, rather than to its Auth Server (eg. auth.docker.io/token). Docker registry documentation doesn't even mention that one can use Basic Auth to directly authenticate to a registry -- the only documented way is through the Token auth method.
  2. There are some registry servers that support direct Basic Auth, such as Harbor. Howerver, the default (when no Authorization header is supplied) is token-based auth. This effectively means that every registry not explicitly known to DRCA (by setting explicit creds) will fail anonymous pull request. One interesting case is when trying to force token-auth for Harbor -- it won't work because of this line.
  3. There are no registry server that I know of that don't support token-based auth.

One more related thing that I want to mark is the following use case: At our company we have several docker registries, but all of them have their user identities sync'ed. So, there might be many registries, but I can connect to all of them with the same credentials. Every team is free to spin up their registry for its own purposes, however it's very common for them to be interested in integrating with some automation tools/services that need to access the registry. With the current explicit credentials-per-endpoint approach, this automation tooling (that use DRCA) should explicitly know with what endpoints it will have to operate which is very limiting. It'd be great if there are some default creds to be used if no explicit configuration exist for this endpoint.

crashvb commented 2 years ago

You make a compelling argument and this area was developed without a clear design / pattern; most of the effort went into testing against a library/registry image with a basic authentication backend, followed by dockerhub, then quay.io, then OCR on-prem ... which left it kinda hackish. Let's use this issue to track a revamp.

While it's being developed, it should be possible to seed the default credentials store with either SSO / bearer tokens, or with the desired combination of individually managed endpoints as a workaround; but, I haven't traced through all the combinations / edge cases for token retrieval w/ and w/o basic auth.

This one might take a little longer as the documentation for each auth endpoint appears to be quite scattered, and implementation specific, but in the end it should be possible to produce either a generalized solution, or a modular framework with some built-in defaults for popular endpoints.

skrech commented 2 years ago

While it's being developed, it should be possible to seed the default credentials store with either SSO / bearer tokens, or with the desired combination of individually managed endpoints as a workaround; but, I haven't traced through all the combinations / edge cases for token retrieval w/ and w/o basic auth.

I didn't quite get what you mean -- I don't think that currently there is a way to set default credentials because every credential is mapped to a specific endpoint, there is no credential for the "mismatch" case.

I'd be very happy to help with testing for private repository access with Harbor, because that's what we're using. However, I don't have access to private repo on docker.io :/

crashvb commented 2 years ago

No worries; I think I am a little confused on the use case.

My understanding is that most registry / auth combinations leverage some existing secret (basic auth credentials, certificates, other tokens SSO) to authenticate to the auth endpoint and issue a (scoped) token for interacting with the registry. Some of them also integrate additional metadata / secrets into location redirects when referencing foreign layer locations. ... mainly that a primary authentication method is required to receive the secondary bearer token.

Currently, DRCA leverages the defacto default ~/.docker/config.json (with env override, and add_credentials ) location to attempt to retrieve the primary credentials and an includelist of endpoints to attempt to retrieve the secondary token; with all other endpoints assumed to be anonymous.

From the description above, this design (or lack thereof ;) ) prevents three things: 1) It prevents externally authenticated secondary bearer tokens from being provided without specifying the primary credentials and attempting to retrieved the token directly (bug) 2) It does not provide a method by which a default set of credentials can be used -- unmatched endpoints are assumed to be authenticated (bug / enhancement) 3) The method used to match endpoints is rigid and does not allow for end users to define a pattern or matching method (enhancement).

(Here's where I think I'm missing things, assumptions enumerated) ... Assuming the use case:

(A) multiple registries (disjoint) / multiple registries (mirrored) (B) all using the same authentication scheme / auth backend (SSO, LDAP, domain authenticated? ... I don't need to know which one) (C) all issuing secondary tokens to restrict access to content that either disjoint or replicated

If these assumptions hold true, wouldn't each registry be issuing tokens to end users that are only valid within the context of that one instance? (i.e. if using docker-cli, end users would need to docker-login to each endpoint individually ... which would have endpoint-specific entries in the local credentials store) If it were a more complex topology with multiple HA instances and an LB in front then I think the assumption would fail, but also that there would also be token replication on the backend.

Until the fix is implemented, the suggested workaround from above was to make available the endpoint-specific (primary) credentials within the store or using add_credentials, but going over the code again I'm not sure that's possible with the line that you pointed out mandating basic auth =/. In the short term there is always inheritance / duck punching, or I can push out an intermediate solution that refactors the get_credentials method as a default to a developer exposed callback (bring-your-own-auth), with the intention of adding fixes to the three issues listed above in the mid term.

skrech commented 2 years ago

My understanding is that most registry / auth combinations leverage some existing secret (basic auth credentials, certificates, other tokens SSO) to authenticate to the auth endpoint and issue a (scoped) token for interacting with the registry. Some of them also integrate additional metadata / secrets into location redirects when referencing foreign layer locations. ... mainly that a primary authentication method is required to receive the secondary bearer token.

I think your understanding is correct.

Currently, DRCA leverages the defacto default ~/.docker/config.json (with env override, and add_credentials ) location to attempt to retrieve the primary credentials and an includelist of endpoints to attempt to retrieve the secondary token; with all other endpoints assumed to be anonymous.

This is how DRCA operates now. However, this means that DRCA is capable to authenticate to only explicitly enumerated list of endpoints known beforehand. I suggest, DRCA to also support a default set of credentials, if no endpoint from the enumerated ones matches. If default creds are supplied, then all the requests will be assumed to be authenticated, rather than anonymous. Now, when I think of it... this is up for discussion from security standpoint -- is it safe to send credentials to unknown registries, that might be potentially dangerous...

From the description above, this design (or lack thereof ;) ) prevents three things: 1) It prevents externally authenticated secondary bearer tokens from being provided without specifying the primary credentials and attempting to retrieved the token directly (bug) 2) It does not provide a method by which a default set of credentials can be used -- unmatched endpoints are assumed to be authenticated (bug / enhancement) 3) The method used to match endpoints is rigid and does not allow for end users to define a pattern or matching method (enhancement).

  1. I'm not versed enough to say whether tokens should be supplied directly.
  2. ..."unmatched endpoints are assumed to be anonymous". This issue is about whether we need to change the assumption to authenticated when using the default credentials.
  3. This would be great.

(Here's where I think I'm missing things, assumptions enumerated) ... Assuming the use case:

(A) multiple registries (disjoint) / multiple registries (mirrored) (B) all using the same authentication scheme / auth backend (SSO, LDAP, domain authenticated? ... I don't need to know which one) (C) all issuing secondary tokens to restrict access to content that either disjoint or replicated

If these assumptions hold true, wouldn't each registry be issuing tokens to end users that are only valid within the context of that one instance? (i.e. if using docker-cli, end users would need to docker-login to each endpoint individually ... which would have endpoint-specific entries in the local credentials store) If it were a more complex topology with multiple HA instances and an LB in front then I think the assumption would fail, but also that there would also be token replication on the backend.

Until the fix is implemented, the suggested workaround from above was to make available the endpoint-specific (primary) credentials within the store or using add_credentials, but going over the code again I'm not sure that's possible with the line that you pointed out mandating basic auth =/. In the short term there is always inheritance / duck punching, or I can push out an intermediate solution that refactors the get_credentials method as a default to a developer exposed callback (bring-your-own-auth), with the intention of adding fixes to the three issues listed above in the mid term.

For default credentials, I mean only the primary credentials. I don't think the scoped tokens should be shared across registries (I don't know if it's even possible).

I should note something for clarification:

  1. When using Basic Auth we have only "primary credentials" eg. username and pass.
  2. When using Token Auth then we have two types of credentials -- primary and secondary (scoped tokens).

One is for sure though, when using Token Auth, DRCA shouldn't supply our primary creds to the registry! It should only supply primary creds to the auth server. So this line shouldn't be there, IMO.

crashvb commented 2 years ago

Ok, @ecolvin and I were able to spend sometime on this today and got something working. The line passing basic auth credentials to the registry was removed; however, the ability to pass basic auth credentials to the authorization service was maintained because certain providers (not blue Hat) require this. Also, there's a flag that was defaulted to True to hopefully not break things.

It should be possible to pass a "default" pattern when adding primary credentials as follows:

async with DockerRegistryClientAsync(fallback_basic_auth=False) as docker_registry_client_async:
    await docker_registry_client_async.add_credentials(credentials=..., endpoint=re.compile(r"^.*$"))

... or, if the token is already known, likewise:

async with DockerRegistryClientAsync(fallback_basic_auth=False) as docker_registry_client_async:
    await docker_registry_client_async.add_token(endpoint=re.compile(r"^.*$"), scope=..., token=...)