coder / coder

Provision remote development environments via Terraform
https://coder.com
GNU Affero General Public License v3.0
7.97k stars 667 forks source link

flake: `TestWorkspaceProxyWorkspaceApps_BlockDirect/ReconnectingPTY/SignedTokenQueryParameter` #12789

Closed dannykopping closed 5 months ago

dannykopping commented 6 months ago

As seen here: https://github.com/coder/coder/actions/runs/8463660814/job/23186948453?pr=12788

=== FAIL: enterprise/wsproxy TestWorkspaceProxyWorkspaceApps_BlockDirect/ReconnectingPTY/SignedTokenQueryParameter (11.24s)
                ...
            Error Trace:    /home/runner/actions-runner/_work/coder/coder/coderd/workspaceapps/apptest/apptest.go:1770
                                        /home/runner/actions-runner/_work/coder/coder/coderd/workspaceapps/apptest/apptest.go:99
            Error:          Condition never satisfied
            Test:           TestWorkspaceProxyWorkspaceApps_BlockDirect/ReconnectingPTY/SignedTokenQueryParameter
dannykopping commented 6 months ago

@spikecurtis based on the possible relation to https://github.com/coder/coder/issues/9764, I'm assigning you to have a look. If that's incorrect, please LMK

spikecurtis commented 5 months ago

I'm pretty sure the issue here is that there is a race between the PTY session ending and flushing the stats.

    t.go:99: 2024-03-28 07:07:56.192 [debu]  coderd: GET  host=localhost:44927  path=/api/v2/workspaces/1ea65379-e045-40f0-acea-b3a6e115f13e  proto=HTTP/1.1  remote_addr=127.0.0.1  start="2024-03-28T07:07:56.182000677Z"  took=10.489382ms  status_code=200  latency_ms=10  request_id=7fab4296-63e8-4b95-ac83-96c4410b2f84
    t.go:99: 2024-03-28 07:07:56.192 [debu]  coderd.workspaceapps.stats_collector: flushing workspace app stats
    t.go:99: 2024-03-28 07:07:56.192 [debu]  coderd.workspaceapps.stats_collector: flushed workspace app stats
    t.go:99: 2024-03-28 07:07:56.194 [debu]  workspaceapps: pty Bicopy finished  server_url=http://localhost:41069  agent_id=5321f01b-1dfd-484b-8aae-f976f12bfa77  request_id=86bc1ff0-5bc7-45b2-b3d9-107339d1f0ae
    t.go:99: 2024-03-28 07:07:56.194 [debu]  coderd.workspaceapps.stats_collector: collected workspace app stats  report={"user_id":"cffc37ad-b8c2-4fa8-a9cb-c3e46d229bf3","workspace_id":"1ea65379-e045-40f0-acea-b3a6e115f13e","agent_id":"5321f01b-1dfd-484b-8aae-f976f12bfa77","access_method":"terminal","slug_or_port":"","session_id":"3ffa5cff-185c-435a-99b6-f671ea85dba0","session_started_at":"2024-03-28T07:07:56.085132Z","session_ended_at":"2024-03-28T07:07:56.194987Z","requests":1}
    t.go:99: 2024-03-28 07:07:56.195 [debu]  net.workspace-proxy: GET  server_url=http://localhost:41069  host=localhost:41069  path=/api/v2/workspaceagents/5321f01b-1dfd-484b-8aae-f976f12bfa77/pty  proto=HTTP/1.1  remote_addr=127.0.0.1  start="2024-03-28T07:07:56.01621909Z"  took=178.822837ms  status_code=101  latency_ms=178  request_id=86bc1ff0-5bc7-45b2-b3d9-107339d1f0ae

This shows the initial GET to determine the baseline LastUsedAt, then the flush, then the new stats coming in after the flush, and finally the HTTP log for the PTY session coming to an end. So, we flushed before we recorded the PTY session, and so the LastUsedAt isn't bumped until the next flush.