Open dannykopping opened 1 week ago
Ok, so from what I understand, all the data we need is already there. We just need to return this data in the timing endpoint. Is that correct? For example:
return sdk.AgentTiming{
StartedAt: a. CreatedAt,
EndedAt: a.FirstConnectedAt,
Agent: a.Name
}
@BrunoQuaresma that sounds about right to me based on the issue description! I would double-check that the time returned roughly matches with the time it takes to download and start the agent.
Hhmm maybe we need the agent download time as well, that would be very useful
@dannykopping so we want to display two timings for each agent, connection, and download times?
@dannykopping so we want to display two timings for each agent, connection, and download times?
I think so, WDYT?
I'm trying to figure out what is the best way to display the agent times based on this issue's discussion. I recorded a video explaining my doubts and thoughts a bit more.
https://github.com/user-attachments/assets/1a25c55f-a0f8-447e-9bb6-268ee413ec9e
I'm for option two, then we are consistent with the patterns.
The agent times are part of dev
. So dev
should have one bar including two smaller boxes. When you click on them, you can see the timings for connection
and download times
. For the colors, if there is no existing state like success or failed (as we have for start
) we can use grey bars without a legend.
Hope that makes sense. Let me know if you prefer a mockup but I think we're aligned @BrunoQuaresma :)
OK, I think we need to take a step back here.
In the design you've got "start" and "dev"; I think I see why you're doing this (because one workspace can have many agents). Even so, the "start" section would have to be under "dev" since each agent has its own startup scripts.
Each agent will have (in order) timings for downloading, connecting, and executing startup scripts. Each should be individual lanes of the timing chart like we have already for "init", "plan", "graph", etc.
I think we should assume that 99% of the time workspaces will have a single agent, so let's keep the design simple by excluding the agent name in that case.
If there are multiple agents, we should repeat the "workspace boot" section and in that we could include the agent name.
On a related note, the main panel label "Provisioning time" is too specific, since it does not include the agent timings. I think we should rename this to "Build timeline" as I suggested here.
I also think we should be more clear and rename "workspace boot" to "agent initialization". The whole timeline is effectively the timings for workspace boot, so this will confuse folks.
@dannykopping it makes sense. It looks like we don't have the download agent timing for now, so I will open a PR to include the connection timing and group it together with the start stage in the same section for each agent. It would look like the following:
Wdyt?
Perfect 👍
I think we should name the stages "run startup scripts" and "connect", to remain consist with the other stages (present tense verbs). The help text for the registration step could be "Time taken to establish an RPC connection with the control plane".
As part of https://github.com/coder/internal/issues/44, we should show when the agent registration occurred and its start/end time in this timeline. This gives users the ability to understand the delta between workspace provisioning completion and the agent startup, which is really an approximate measure of how quickly the compute instance spins up and executes the agent process and how long it takes to call back to the control-plane.
In https://codercom.slack.com/archives/CJURPL8DN/p1729196074736939, Spike and I discussed what would the definition for "agent registration" be. We settled upon:
The time between an agent process starting and the successful completion of the call to /api/v2/workspaceagents/me/rpc
. See here: https://github.com/coder/coder/blob/7da231bc9286060e7ad499fa38cb829c4b055a7d/agent/agent.go#L750-L759The
workspace_agents.first_connected_at
column seems to satisfy this definition fairly well, and is set here: https://github.com/coder/coder/blob/5366f2576f690a3f7d3ac1a4efb8dd49cc2e9bd1/coderd/workspaceagentsrpc.go#L330-L338workspace_agent.created_at
is set here which is invoked once a provisioner job has been completed: https://github.com/coder/coder/blob/fac77f956ec28ad04c0c90750e381e69465d2ae1/coderd/provisionerdserver/provisionerdserver.go#L1782-L1784Based on this, I think we can simply return the value of
created_at
(start) tofirst_connected_at
(end) as a reasonable approximation of agent registration time.Nit: do we want to persist with this "registration time" terminology or is "connection time" more clear?