Add UI for flows - Githubissues

mstange commented 2 weeks ago

Corresponding Bugzilla bug: Bug 1919845.

A "flow" is a way to associate markers with each other, across threads/processes and across different marker types. Example flows could be: a network request, a paint, a touch event, a setTimeout, etc. Multiple markers of different types can then be associated with each other by mentioning the same flow in their fields. And a single marker can be associated with multiple flows, for example a Paint marker could be associated with the flow for the paint itself, the flow for the refresh driver tick it was a part of, the flow for the style flush which requested the paint, and the flow for the composite which put this paint onto the screen.

Example profile: https://share.firefox.dev/4hxDJpt

This example profile was gathered on a build which contains a WIP patch from bug 1919845.

User story: I have a marker for an image request (nsImageLoadingContent::LoadImage). I want to find the marker for the DOM load event which was fired for this image request.

Here's how I would find the DOM load using the current UI, on this profile:

I start at the nsImageLoadingContent::LoadImage marker. It has a flow field with the value 000000010924c9c00.

Click on the marker, copy 000000010924c9c00 from the tooltip. This is our flow ID. Search for 000000010924c9c00.

There are four matching markers. All of these markers are part of the same flow. (In this example we don't have to worry about flow ID reuse.) One of the matching markers is a nsImageLoadingContent::FireEventd marker:

nsImageLoadingContent::FireEventd — nsImageLoadingContent::FireEventd (flow1=000000010924c9c00,flow2=0000000108ef89500)

This marker has a flow2 field. Click the marker, copy 0000000108ef89500.

Now search for 0000000108ef89500. There are three matching markers - one is the nsImageLoadingContent::FireEventd marker we just selected, the other two are AsyncEventDispatcher::Run and ~LoadBlockingAsyncEventDispatcher.

~LoadBlockingAsyncEventDispatcher terminates the flow with the flow ID 0000000108ef89500, because it has this ID in its terminatingFlow field.

The AsyncEventDispatcher::Run has 0000000108ef89500 in its flow1 field. Its flow2 field points to yet another flow: 000000010bc7e2000.

If we search for 000000010bc7e2000, this brings us to a DOMEvent marker. (In this example profile, that DOMEvent marker isn't the actual DOMEvent marker, because the implementation isn't polished yet - the marker we found is just an extra helper marker with roughly the same timestamp.)

Success! This is the DOM event marker we were looking for. We couldn't have found this with the existing markers - the DOMEvent marker just doesn't contain enough information to match the right marker to our nsImageLoadingContent::LoadImage marker.

What UI do we want?

Just putting in free-form strings, as in the example profile, is a good start. But without any UI changes, it can be frustrating to use.

The biggest problem is that the active flows are not tracked. Concretely, the biggest problem is that reused flow IDs cause misleading search results: the search by flow ID finds markers which are in a different flow, just because that other flow happened to use the same flow ID. The search doesn't know about "terminating flows".

The most pressing needs are:

The ability to search by flow, not just flow ID. (See the "Data representation" section below.)
The ability to navigate between the previous and the next marker within a flow.

Strawman proposal

Add two new marker field type: FlowID and TerminatingFlowID. In the JSON serialization, these fields are serialized as strings. (If we ever add a binary format, we can represent flow IDs as u64 in that format.)
In the tooltip, for a flow field, have three buttons inside the tooltip field row: "Previous", "Next", and "View all".
Add a special marker search syntax: flow:000000010924c9c00;10213.54 (flow ID and in-flow-timestamp, matches all markers which belong to this flow)
Clicking "View all" changes the current marker search to one which shows all markers belonging to that flow. Use the timestamp of the clicked marker in the search.
Clicking the next/previous buttons should find the next/previous marker in that flow, select the thread that contains that marker, scroll the marker chart track into view, select the marker, and show its tooltip in the sticky state.
To navigate between connected flows, just click the relevant buttons in the other field.

Tracking flows across threads

You want to go between threads. In the example the markers belonging to the flow we were looking at were all on the same thread, but generally they'll be on different threads.

Navigating through flows becomes a lot more useful when the markers are on different threads.

If you just had the ability to search by flow, you could sort of get by by selecting all the tracks, but it's not ideal.

Data representation

This is heavily inspired by Perfetto.

Here's the "flows" field in the Protobuf definition: https://github.com/google/perfetto/blob/adb90dc0b61030e937326bb20c2118bc3110e7f5/protos/perfetto/trace/track_event/track_event.proto#L204-L222

There are flows and flow IDs. A flow ID is usually generated from a pointer, whose address is XORed with a per-process UUID. Since pointer addresses can get reused, two different flows may have the same flow ID. If you have a flow ID, how do you determine which flow it refers to? It depends on the timestamp of the marker with the flow ID. There is a notion of "active flows" for each timestamp. If a flow ID is used in a TerminatingFlowID field, then this marker terminates the flow - starting at this marker's timestamp, that flow is no longer active. So if you look at markers ordered by time, and you see a flow ID and there exists no active flow with that flow ID, then this marker starts a new flow for that flow ID and puts it into the list of active flows.

So markers can:

start a flow, by having a FlowID field which is set to a flow ID which is not part of the active flows
terminate a flow, by mentioning the flow ID in a TerminatingFlowID field
connect multiple flows, by having multiple fields of type FlowID.

Flows can cross processes because flow IDs are more-or-less globally unique.

In Perfetto's representation, the flows and terminating flows for an event are just a list of IDs. In our representation, we would have a separate marker field for each flow / terminating flow. Is this good? It would at least allow us to name the flows. For example, if your flow is generated from a imgIRequest* pointer, you could put the flow ID into a field that's named request. Not sure if that's useful. Having a different representation means that it would be harder to convert perfetto traces to the Firefox profiler format.

┆Issue is synchronized with this Jira Task

julienw commented 1 day ago

Thanks, this looks pretty solid!

I have a few remarks and questions.

First questions:

Here's how I would find the DOM load using the current UI, on this profile:

From your example it looks like you see the flows as a tree. For example (all markers are ordered by timestamp in each of the flows):

Action 1 => Flow 1 => Marker A (Flow 1), Marker B (Flows 1, 2), Marker C (Flows 1) Flow 2 => Marker D (Flow 2), Marker B (Flows 1, 2) Marker E (Flow 2), Marker F (Flows 2, 3) Flow 3 => Marker F (Flows 2, 3), Marker G (Flow 3)

From there, here are a few questions:

Does Flow 3 inherit from Flow 2 which itself inherits from Flow 1? (in that case it's possible that markers from Flow 2 should also have Flow 1? Maybe?)
Or are they independent parallel flows?
Should we, when displaying Flow 1, also display Flow 2 and Flow 3, or make it easy to display them (that is without having to display the tooltips for the markers that have these other flows)?
Should we distinguish causal relationships? If yes, how?

Clicking "View all" changes the current marker search to one which shows all markers belonging to that flow. Use the timestamp of the clicked marker in the search.

Does that mean that "view all" should also select all threads that contain such markers, so that we would see them in the view?

If a flow ID is used in a TerminatingFlowID field, then this marker terminates the flow - starting at this marker's timestamp, that flow is no longer active.

=> from your text, I understand "the flow is no longer active at the marker's START timestamp", is that what you intended? Or should that be the END timestamp? (in the case of an instant marker, START == END)

A flow ID is usually generated from a pointer, whose address is XORed with a per-process UUID.

Just a clarification: is it expected that flows can also be across processes, not just across threads in the same process? I also guess that we could have flow ids that are not generated from a pointer, eg a connection id or a network channel id.

In Perfetto's representation, the flows and terminating flows for an event are just a list of IDs. In our representation, we would have a separate marker field for each flow / terminating flow. Is this good? It would at least allow us to name the flows. For example, if your flow is generated from a imgIRequest* pointer, you could put the flow ID into a field that's named request. Not sure if that's useful. Having a different representation means that it would be harder to convert perfetto traces to the Firefox profiler format.

I think I'd be in favor of just a list of IDs as well. Naming the flow could be possible by for example using the name of the first marker in the flow?

Initially I was thinking of a way to type the flows. Different types could mean different layers: for example flow 1 would be from an interaction, flow 2 coming from a network request (and of course some markers could have both 1 and 2). That means we'd need to keep a dictionary of them so that we can output it to the JSON, but that could be difficult.

I think it's worth roughly deciding how a future UI could look like even if we don't build it right now. We could organize the markers per flow, but then we need to decide where one marker being in several flows would be drawn. Maybe such marker would be in the first-starting flow, with an arrow pointing to the other flows? Other questions to answer?

mstange commented 4 hours ago

From your example it looks like you see the flows as a tree.

In my example I'd say it's undirected graph; Flow 1 --- Flow 2 --- Flow 3 . It's a graph rather than a tree because there can be cycles, for example there could be a marker which mentions both flow 1 and flow 3. It's undirected rather than directed because a marker with two flows does not imply a direction between those two flows.

But since I wrote down my example, my thoughts have shifted a bit, and I am thinking that having directions would actually be pretty good. But I would need a new example to describe this properly.

For example (all markers are ordered by timestamp in each of the flows):

Action 1 => Flow 1 => Marker A (Flow 1), Marker B (Flows 1, 2), Marker C (Flows 1) Flow 2 => Marker D (Flow 2), Marker B (Flows 1, 2) Marker E (Flow 2), Marker F (Flows 2, 3) Flow 3 => Marker F (Flows 2, 3), Marker G (Flow 3)

From there, here are a few questions:
1. Does Flow 3 inherit from Flow 2 which itself inherits from Flow 1?

I'm not sure what you mean by "inherits". But you could definitely say Flow 3 is caused by Flow 2 which is caused by Flow 1, at least in this example: the DOMEvent, which is what Flow 3 represents, is run because the AsyncEventDispatcher runs (Flow 2 represents the AsyncEventDispatcher object), and the AsyncEventDispatcher runnable runs because it was dispatched during the nsImageLoadingContent::FireEvent call. Flow 1 represents the nsImageLoadingContent object.

(in that case it's possible that markers from Flow 2 should also have Flow 1? Maybe?)

I don't understand this question - one marker from Flow 2 also has Flow 1, that's what connects the two flows. The other markers from Flow 2 don't want to have to keep track of the Flow 1 ID.

2. Or are they independent parallel flows?

I consider them to be independent parallel flows which are "directly connected" with each other by having a shared marker.

3. Should we, when displaying Flow 1, also display Flow 2 and Flow 3, or make it easy to display them (that is without having to display the tooltips for the markers that have these other flows)?

Yes, I think such a UI would be very useful. For the first UI I would say we should only go one step at a time; for example, if Flow 1 is entered, it would be useful to see Flow 2. Then when you enter Flow 2 from there, we should show both Flow 1 and Flow 3, because those are the flows that are connected to Flow 2 within a single "step of connection". But maybe we'll realize that we want more than just one step, I'm not sure.

4. Should we distinguish causal relationships? If yes, how?

Good question! I think we probably want to differentiate "incoming" and "outgoing" flows at least. I'll say more about this at the end of this post.

Clicking "View all" changes the current marker search to one which shows all markers belonging to that flow. Use the timestamp of the clicked marker in the search.

Does that mean that "view all" should also select all threads that contain such markers, so that we would see them in the view?

If we still want to go with this UI idea (see the end of this comment), then yes, that would be useful.

If a flow ID is used in a TerminatingFlowID field, then this marker terminates the flow - starting at this marker's timestamp, that flow is no longer active.

=> from your text, I understand "the flow is no longer active at the marker's START timestamp", is that what you intended? Or should that be the END timestamp? (in the case of an instant marker, START == END)

I left it intentionally vague because I wasn't sure what would be the right choice. But yes, let's make it the marker's START timestamp for now. If that causes trouble, we could make it so that we treat interval markers which are based on separate start and end raw markers differently: if the flow is a field in the start marker, we use the start timestamp, and if it's in the end marker, we use the end timestamp.

A flow ID is usually generated from a pointer, whose address is XORed with a per-process UUID.

Just a clarification: is it expected that flows can also be across processes, not just across threads in the same process? I also guess that we could have flow ids that are not generated from a pointer, eg a connection id or a network channel id.

That's exactly right: for flows which connect across processes, you would not generate the ID based on a pointer, you would generate it based on some kind of counter and maybe a random u64 "namespace". If a cross-process flow ID were generated from a pointer, you'd run into trouble when you reuse that address, because there's no good place to put the terminating flow. But using counters for this is fine because you're in control of when a counter is reused.

In Perfetto's representation, the flows and terminating flows for an event are just a list of IDs. In our representation, we would have a separate marker field for each flow / terminating flow. Is this good? It would at least allow us to name the flows. For example, if your flow is generated from a imgIRequest* pointer, you could put the flow ID into a field that's named request. Not sure if that's useful. Having a different representation means that it would be harder to convert perfetto traces to the Firefox profiler format.

I think I'd be in favor of just a list of IDs as well. Naming the flow could be possible by for example using the name of the first marker in the flow?

The main use case I can think of where a list of IDs would be useful is when you want to instrument code which has a dynamically-sized list of associated entities, for example a refresh tick might have a list of CSS animations that caused that tick, and you might want one flow per animation, and then put those into a list of flow IDs.

But mostly I think just one flow per field is actually fine. I think that's what maps to the current format most cleanly. I'd prefer to wait until we have a convincing use case before adding arrays of flows. You can always emit more markers with individual flows as a workaround.

Initially I was thinking of a way to type the flows. Different types could mean different layers: for example flow 1 would be from an interaction, flow 2 coming from a network request (and of course some markers could have both 1 and 2). That means we'd need to keep a dictionary of them so that we can output it to the JSON, but that could be difficult.

I agree that typing flows could be quite useful. We could have a separate flowSchemas list, and then in the marker schema, the description of the flow field could have an index into to that list of schemas. But that's something we can add later. Eventually this could allow us to replace the special handling of IPC markers and network markers with something schema-based, where IPC messages and network requests are just flows with special annotations.

For posterity, I'll write down a quick list of customizations that we might want to perform:

'This flow is a "Paint" / "Event listener" / "CSS animation" / "setTimeout", please label the chart row in the flow panel accordingly'
'This runnable flow should be labeled with the runnable name, which is in this marker's name field'
'This task flow should be associated with the thread that this task interval marker is on, not with the thread which enqueued the task'
'Take the URL field from this marker and put it on this flow which represents a network request'
'The rectangle for this flow should be tinted in green because this is a network request for an image'
'Starting at the start time of this marker, the opacity of the tinted fill in this flow's box should be increased to 30%'
'This dispatch marker should be considered "outside"/"before" the flow's "main interval", only draw a thin line until the runnable actually starts'
Similarly, 'This DidComposite marker should be considered "outside"/"after" the paint flow's main interval, don't make it part of the box, just draw a horizontal line to it from the end of the box'
'All flows of this type should be displayed in a timeline row (network track)'

I think it's worth roughly deciding how a future UI could look like even if we don't build it right now. We could organize the markers per flow, but then we need to decide where one marker being in several flows would be drawn. Maybe such marker would be in the first-starting flow, with an arrow pointing to the other flows? Other questions to answer?

Yes, let's do that. Here's what I have in mind:

Flow panel (or "flow chart"? hah)

The flow panel is like a second marker chart that is displayed in the marker chart tab, below the existing marker chart, with a horizontal splitter in between. It only appears once you enter a flow by clicking a marker and entering a flow from it somehow. The flow panel does not respect the selected thread or the marker chart's search filter. It remains as-is even when the user selects a different thread. Its contents only change when you enter a different flow. But it respects the current preview selection, i.e it stays in sync with the horizontal time axis / zoom of the rest of the marker chart. The flow panel has one row per flow. Each flow has a start and an end time, implied by its first and last marker. A flow is drawn in its row as a box with rounded corners, similar to a network request in the network panel. Within the flow's box, there's a dot at the start timestamp of each of the flow's markers. The flow panel displays the current flow, as well as one row for each of the flows which are directly connected to this flow, i.e. flows which share markers with this flow. Shared markers are duplicated in the flow's rows, i.e. such a marker would have multiple dots. (And outside the flow panel, that same marker might have another diamond or box in the regular marker chart.) Hovering a dot in the flow shows the tooltip for the marker that the dot represents. If this marker directly connects this flow to another flow, a vertical connection line is drawn between the dots that represent this marker. If this marker is an interval marker, we should also highlight the entire inteval within the flow box somehow. You can enter another flow with a single click. (But I'm not sure where you'd click!) If you want to keep the current flow on the screen as you explore other flows, you can "pin" a row. This keeps the row for that flow as a pinned row at the bottom of the flow panel and it won't go away as you enter other flows, until you unpin it.

Context flows

One thing we noticed pretty quickly during prototyping was that it was a hassle to manually connect flows. For example, if you have a flow for a runnable, you'd like to connect this runnable to any runnables that get dispatched while this runnable is running. But propagating an ID down the stack can be very cumbersome; you might have to pass it through many functions, or you'd have to put it in a global / into TLS. In the initial example, we special-cased the AsyncEventDispatcher runnable, but really this is just one of many runnable types, and we'd rather have flows for all runnables rather than having to do something custom for each runnable type.

So we want some implicit connection. Here's our current idea:

Make stack-based markers create "context flow" connections.

Let's call markers with flow fields "flow markers". For every stack-based flow marker, define the closest stack-based interval flow marker that contains it as its "parent context flow marker". And each stack-based interval flow marker can have zero or more "child context flow markers", i.e. the flow markers that have this flow marker as their parent context flow marker. The parent context flow marker can be part of one or more flows, based on its flow fields. A flow marker's "incoming context flows" are the flows that its parent context flow marker is part of. A flow marker's "outgoing context flows" are the flows that its child context flow markers are part of.

For example, runnable A runs and dispatches runnables B and C. And then B dispatches D and E. The marker for the "dispatch" of runnable D would have the flow for runnable B as its "incoming flow". The marker for the dispatch of runnable B would have the flow for runnable A as its "incoming flow". The marker for the "running" of runnable A would have the flows for B and C as its "outgoing flows".

With context flows, we can avoid having to propagate flow IDs in many cases. This also reduces the need to put multiple flow fields on a single marker. You just put one flow ID on your marker, and the "caused by" connection is implied because there's a stack-based flow marker on the stack.

Flow panel and context flows

With context flows, entering a flow would display the following rows in the flow panel:

The incoming context flows for the current flow (i.e. all flows which are incoming context flows for at least one of the flow markers that are part of the current flow)
The current flow
Any flows which are directly connected to the current flow
The outgoing context flows for the current flow

And then hovering a dot in a flow would cause arrows to be displayed, in the direction "incoming to outgoing". For example, hovering the dot for a the start of a flow marker which has a parent context flow marker would draw a vertical arrow from the inside the interval of the parent context flow marker (from within the incoming context flow's row) to the hovered dot. And hovering the start time dot for an interval flow marker which has three child context flow markers would draw three vertical arrows from within the interval to the dots of each of the child context flow markers. And the dots for those associated markers would be highlighted inside their respective flows during the hovering. (And maybe even the entire intervals, for associated interval markers.)