telemetry(amazonq): measure e2e latency for amazon q requests

jpinkney-aws commented 2 months ago

Problem:

We can't measure e2e how long a chat request takes

Solution:

Plumb the trace id through to the webview -> vscode -> webview. This allows us to know exactly what is happening to a message and how long everything took to get there
chat e2e latency can be linked with the other telemetry events via the traceId

Previous PR description

## Problem This is just a draft of what we could potentially do to measure e2e latency in amazon q chat message requests. Feel free to leave comments on the PR ### Main code level problems: - It's hard to connect a tab id to a conversation id. Tab ID's exist pretty much to send/receive messages but they don't really live along side conversation id's. Conversation id's are exclusively related to the session. The only place they meetup is in their chat session storages - Do we want to track individual events like webviewToDispatcher as metrics? - Though there wouldn't really be a way to "listen" on these events except for if we manually allow-listed metrics since theres no reason to attach a tabID to that metric - How do we listen to events emitted from telemetry so that we can find them later? - One solution is proposed in this PR where we listen to events being emitted and then check if we want to track them - Could we connect these with a parent/child mechanism like we want to introduce eventually? - We kind of do this because the "listener" event is the parent and the things it finds are the children. The problem is we can really have only one "listener" for these Q events, since we don't know what tabs are related to what conversation ID's until much later (after the first request). This is why we build up the map of conversation metrics and tab metrics and join them together before we emit ## Solution This PR does a couple things: - Moves up the chat session storage for codewhisperer chat. This allows us to eventually connect conversation ids to tabs - Introduces a chat storage facades that allows us to call getConversationId on any tab type. This is only implemented for cwc right now - Introduces a listener pattern that allows telemetry events to "listen" to metrics that are being emitted - emits a sample events that can you can trace from "pressing enter" to "receives message in ui" ### Sample event when sending a request in amazon q chat: ``` 2024-09-11 10:11:05.712 [debug] telemetry: chat_roundTrip { Metadata: { duration: '14585', name: 'onChatPrompt', child_metrics: 'initialRequestTime,webviewToDispatcher,dispatcherToFeature,amazonq_enterFocusConversation,amazonq_startConversation,amazonq_addMessage,finalRequestTime', child_metric_durations: '{"webviewToDispatcher":1,"dispatcherToFeature":3,"amazonq_enterFocusConversation":1542,"amazonq_startConversation":2,"amazonq_addMessage":13003,"finalRequestTime":34}', result: 'Succeeded', awsAccount: 'not-set', awsRegion: 'us-east-1' }, Value: 1, Unit: 'None', Passive: false } ``` ### You can also get events from the right click context menu items as well (explain, refactor, etc) ``` 2024-09-11 10:11:34.384 [debug] telemetry: chat_roundTrip { Metadata: { duration: '12885', name: 'aws.amazonq.explainCode', child_metrics: 'initialRequestTime,webviewToDispatcher,amazonq_enterFocusConversation,amazonq_addMessage,finalRequestTime', child_metric_durations: '{"webviewToDispatcher":17,"amazonq_enterFocusConversation":1263,"amazonq_addMessage":11572,"finalRequestTime":33}', result: 'Succeeded', awsAccount: 'not-set', awsRegion: 'us-east-1' }, Value: 1, Unit: 'None', Passive: false } ```

License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

github-actions[bot] commented 2 months ago

This pull request modifies files in src/ but no tests were added/updated. Confirm whether tests should be added or ensure the PR description explains why tests are not required.

jpinkney-aws commented 2 months ago

I think another way we could possible go is to just instrument everything via a traceId which is more traditional for distributed tracing (rather than connecting tabIds to conversation Ids at the end of a flow like we do here). I think this might be more scalable for codewhisperer inline as well.

on chat prompt we could create a trace id, plumb that through the entire system and store various different timings in the RecordMap.

This would remove:

listening on telemetry events
remove the chatStorageFacade

we would keep:

record map
start listening/stop listening

end result:

we still get the chat message duration (but we don't necessarily correlate telemetry events with chat_roundTrip (though we totally can))
less complicated to understand/the flow is easily observable e2e
predictable metrics appearing in the end result

github-actions[bot] commented 1 month ago

This pull request modifies code in src/ but no tests were added/updated. Confirm whether tests should be added or ensure the PR description explains why tests are not required.

aws / aws-toolkit-vscode

telemetry(amazonq): measure e2e latency for amazon q requests #5566

Problem:

Solution: