gocd / kubernetes-elastic-agents

Kubernetes Elastic agent plugin for GoCD
https://www.gocd.org
Apache License 2.0
34 stars 32 forks source link

Incorrect credentials being used when creating Kubernetes Client #347

Closed JamesMcNee closed 1 year ago

JamesMcNee commented 1 year ago

Hello,

We are seeing an issue when going from version v3.8.2-350 of the plugin to v3.8.4-408 where intermittently the 'wrong' cluster credentials are being used when pods are being created. The credentials that are being used are that of the GoCD server rather than the ones that have been created specifically for the agents. We know this as we see an error message like below (some info redacted).

Failed to create agent pod: Failure executing: POST at: https://<ip-addr>/api/v1/namespaces/gocd-agents/pods. 

Message: Forbidden! Configured service account doesn't have access. Service account may have been revoked. pods is forbidden: 

User "<gocd-server-service-account>" cannot create resource "pods" in API group "" in the namespace "gocd-agents".

Doing some digging in the changes between these two versions I think the issue is due to an upgrade of the underlying Kubernetes client from version 5.12.2 to 5.12.4.

Between these versions, they have back-ported some functionality to automatically 'refresh' tokens, which is utilising the well known locations for these credentials. The issue is, that this is picking up the credentials for the server, rather than those that should be being used for the agent which is passed into the plugin.

Given a cursory glance I could not find a way to disable this auto refresh mechanism, but I have far from done a deep dive.

chadlwilson commented 1 year ago

Thanks for the digging - yikes, this sounds no good.

What are the requirements to replicate this?

JamesMcNee commented 1 year ago

Hi @chadlwilson -- Thanks for getting back so quickly! Indeed, no good!

Those requirements that you list are correct and all that I can think of to replicate this behaviour; Kubernetes version should be irrelevant for replicating.

Something I forgot to clarify in the above post, I said 'intermittently' using the incorrect credentials -- My belief is that this is due to the refresh occurring every 1 minute and the clients being recycled every 10 by the plugin. So there are periods (e.g. when the server first starts the plugin initialises) where it is within the first minute and therefore skips the refresh.

JamesMcNee commented 1 year ago

@chadlwilson More digging done :D

I believe this is the same issue that we are seeing (raised on the underlying clients repo).

There is a workaround suggested on the issue which is to use the oauthTokenProvider mechanism, rather than supplying a token directly. This method takes a OAuthTokenProvider which is just an interface that gas a get token method on it, so can be used essentially as Supplier<String>.

So it should just be a case of changing this line: .withOauthToken(pluginSettings.getSecurityToken()) to .withOauthTokenProvider(() -> pluginSettings.getSecurityToken()).

Will give this a try and build a version of the plugin locally to try this on our test server. Hopefully will get time for this tomorrow, if successful how would you feel about a PR?

chadlwilson commented 1 year ago

Sure, a PR is welcome. Would need to understand if that's the best way or we are better to turn off autoconfiguration entirely as it seems some others did from that thread?

JamesMcNee commented 1 year ago

Unfortunately neither of the suggestions on that issue worked 😞

Current theory is that they only work with the latest version (6.x) and not with the backports to 5.x, but I have not yet validated this.

JamesMcNee commented 1 year ago

Hey @chadlwilson Have raised a PR updating to use the latest version of the kubernetes-client

Tested both disabling auto configuration completely when using this version and using the token provider, both work, but went with disabling auto configure.

Verified against our development GoCD server.