coder / envbuilder

Build development environments from a Dockerfile on Docker, Kubernetes, and OpenShift. Enable developers to modify their development environment quickly.
Apache License 2.0
140 stars 26 forks source link

401 error for requests to coder.example.com after "Update" option following template change #262

Open kujenga opened 3 months ago

kujenga commented 3 months ago

I am seeing an issue where when I use the "Update" option in the coder dashboard after a template change, the coder process get stuck in this state, where it gets 401 errors trying to talk to the coder agent URL. The issue is resolved by using the "Restart" option instead.

2024-06-12 19:43:20.920 [info]  connecting to coderd
2024-06-12 19:43:20.934 [warn]  run exited with error ...
    error= GET https://coder.example.com/api/v2/workspaceagents/me/rpc?version=2.1: unexpected status code 401: unexpected non-JSON response "": Try logging in using 'coder login'.
               Error: no response body

I am using "token" authentication with the coder agent, with config passed to the container for the auth as:

      {
        name  = "CODER_AGENT_TOKEN"
        value = try(coder_agent.main[0].token, "")
      },
      {
        name  = "CODER_AGENT_URL"
        value = data.coder_workspace.me.access_url
      },

This issue does away after doing a "restart" of the instance.

Discord discussion here: https://discord.com/channels/747933592273027093/1250538421789790271/1250538421789790271

johnstcn commented 3 months ago

@kujenga what version of Coder are you seeing this behaviour on?

kujenga commented 2 months ago

@johnstcn We are still seeing this issue on Coder version v2.13.0+56bf386

Also worth just to clarify, the example.com in the post is a replacement of our internal hostname for the coder instance I'm running, it's correct in the logs.

johnstcn commented 2 months ago

@kujenga I think this is mainly down to a combination of confusing wording in the Coder UI and a lack of clarity on what the 'update' and 'restart' buttons actually do. The 'restart' button is probably what you want instead of 'update'. See https://github.com/coder/coder/issues/13539#issuecomment-2218033354 for some more context.

When a workspace is started, an agent token is generated. This token is linked to the currently active workspace build. What appears to be happening is that the 'update' button is creating a workspace build in the 'start' state, which results in a new agent token being created and the existing agent token being revoked.

My guess would be that your template, for some reason, isn't updating the workspace VM / container with the updated agent token?

Do you see the same behaviour if you 'Change Version' of an active workspace instead of clicking 'Update'?