Open mstange opened 7 years ago
To get the roots, find all funcStacks that have at least one sample that points directly to that funcStack. Group those by func. Each group is a root node. To get the children of a node, collect the prefixFuncStacks of all the funcStacks of that node, and then group those by func. Each of those groups is a child node.
This approach requires less work upfront (invertStacks can be quite expensive), but creates more garbage in the process. At least I think it does - I'm not sure if there's a way to do the collection and grouping without having lots of JS objects involved in the process.
It would be good to have a canonical FuncStackTable
that the indexes would stay the same, that way we can refer to funcStacks across various filtering operations.
I think this change would break all of the data transformation work. I'm going to tentatively close it as no longer relevant. Feel free to re-open if you disagree @mstange
I think the inversion still happens at the end of the transform pipeline, so most of what I wrote above should still apply. I still think this is worth doing.
Please test any patches off of the example inverted profile from #825:
gives the following profile:
Here are some more thoughts on this.
The goal is to use the callNodeTable
of the non-inverted profile for the display of the inverted tree.
Example:
Non-inverted callNode tree:
[callNodeIndex] [functionName] [totalTime] [selfTime]
0 (root) 25 0
- 1 main 25 0
- 2 runLoop 25 10
- 3 appEvent 8 1
- 4 appNetworkEventHandler 4 2
- 5 queueEvent 2 1
- 6 vecPush 1 1
- 7 appWorkFinishedEventHandler 3 3
- 8 nativeEvent 7 1
- 9 nativeClickEventHandler 4 1
- 10 queueEvent 3 2
- 11 vecPush 1 1
- 12 nativeKeyEventHandler 2 1
- 13 vecPush 1 1
Inverted tree:
[rootCallNodeIndex] [nodeCallNodeIndex] [time]
2 2 runLoop 10
- 2 1 main 10
- 2 0 (root) 10
5 5 queueEvent 3
- 10 9 nativeClickEventHandler 2
- 10 8 nativeEvent 2
- 10 2 runLoop 2
- 10 1 main 2
- 10 0 (root) 2
- 5 4 appNetworkEventHandler 1
- 5 3 appEvent 1
- 5 2 runLoop 1
- 5 1 main 1
- 5 0 (root) 1
6 6 vecPush 3
- 6 5 queueEvent 2
- 6 4 appNetworkEventHandler 1
- 6 3 appEvent 1
- 6 2 runLoop 1
- 6 1 main 1
- 6 0 (root) 1
- 11 9 nativeClickEventHandler 1
- 11 8 nativeEvent 1
- 11 2 runLoop 1
- 11 1 main 1
- 11 0 (root) 1
- 13 12 nativeKeyEventHandler 1
- 13 8 nativeEvent 1
- 13 2 runLoop 1
- 13 1 main 1
- 13 0 (root) 1
...
Each tree node in the inverted tree is uniquely identified by the callPath to it, i.e. the list of funcs you need to follow from a function that has self time, to the node. For example, the node 6 5 queueEvent
in the inverted tree is identified by the call path vecPush <- queueEvent
, and the node 5 2 runLoop
is identified by the call path queueEvent <- appNetworkEventHandler <- appEvent <- runLoop
.
The roots of the inverted tree are all the funcs for which there exist a callNode with non-zero self time. For example, the vecPush root represents the three callNodes of the func vecPush
with non-zero self time: 6, 11, and 13.
Each tree node in the inverted tree can also be expressed using two call nodes from the non-inverted tree:
rootCallNodeIndex
.
(Here, "root" means "root in the inverted tree".)rootCallNodeIndex
, in the non-inverted tree, go along the call path to the start of the call path. Call the callNode at the start of the path nodeCallNodeIndex
.Now use the pair (rootCallNodeIndex, nodeCallNodeIndex)
to represent this node in the inverted tree.
Going from this index pair back to the call path is easier: You look up rootCallNodeIndex
in the non-inverted tree and follow its prefix path until you hit nodeCallNodeIndex
. The list of funcs you encounter on this path forms the call path.
Example 1: For the node vecPush <- queueEvent
in the inverted tree, we can find two call nodes in the non-inverted tree that match this path: 6 and 11. (The 2-long prefix paths of both nodes
is queueEvent -> vecPush
.)
6 is the lower of the two. The call node path queueEvent -> vecPush
which ends at callNode 6 starts at callNode 5.
So we represent the node vecPush <- queueEvent
in the inverted tree as the index pair (6, 5).
Example 2: vecPush <- queueEvent <- nativeClickEventHandler <- nativeEvent
.
There is only one call node in the non-inverted tree that matches this path, callNode 11.
Here, the path starts at node 8.
Represent the node vecPush <- queueEvent <- nativeClickEventHandler <- nativeEvent
as the index pair (11, 8).
Example 3: Go from the index pair (10, 1) to the callPath.
CallNode 10 is 10 queueEvent
.
Its prefix path is queueEvent <- nativeClickEventHandler <- nativeEvent <- runLoop <- main ...
and we stop at main because that corresponds to callNode 1 main
, callNode index 1.
[to be continued]
It turns out the canonical index computation is not worth it - computing this index takes a long time and we have to do it for many nodes. Using a stringified call path as the node ID works much better.
I have some of this working in a pretty messy branch in my repo: https://github.com/devtools-html/perf.html/compare/master...mstange:fast-inverted-tree?expand=1#diff-d973fedad79fc02f8035c8c4a63d5b3c
Here's my plan of attack for doing this properly:
CallTreeNodeID
which is different from IndexIntoCallNodeTable
:
CallTree
API.
null
and -1
in the CallTree API.CallTreeNodeID
and IndexIntoCallNodeTable
being the same:
CallTree
.getSelectedCallNodeIndex
.getCallTree
selector, which depends on the range selection. So getSelectedCallNodeIndex
(and, more importantly, getExpandedCallNodeIndexes
) would be recomputed whenever the range selection changes, which is undesirable. So:CallTreeNodeID
<-> CallNodePath
conversion) and one that does depend on the samples (and computes costs per node)?selectedFunc
and selectedCallNodePath
instead of selectedCallNodeIndex
func
and the depth for a given call tree node ID.i
matches the currently selected call pathgetHeaviestStackUnderCallNodePath
method on the call tree, instead of doing manual computations in invertCallNodePath
.CallTreeNodeID
as a string instead of a number, and insert the necessary conversions. This will slightly decrease performance for the non-inverted tree.InvertedCallTree
class and use it for inverted call trees:InvertedCallTree
with the non-inverted CallNodeTable
.CallTree
into a flow interface and rename the existing CallTree
class to something else, maybe NonInvertedCallTree
.InvertedCallTree
in a similar way to how I did it in my branch:
CallTreeNodeID
s are string concatenations of the call node path (path.join('-')
)callNodes
which are the call nodes from the non-inverted tree that contributed time to this subtree of the inverted call tree.It would be nice to be able to use different node ID types the two tree types: integers for the non-inverted tree and strings for the inverted tree. But I don't think flow would make that easy - there's only one getSelectedCallNodeIndex
selector, for example, and it needs to be clear about what it wants to return. And it can't return string | number
because then, both trees would need to be able to handle both ID types.
Here's the CallTree type definition that we should have at the end (or something similar):
export type CallTree = Tree<CallTreeNodeID, CallTreeNodeDisplayData> & {
preloadChildrenCache(): void,
getNodeData(node: CallTreeNodeID): CallNodeData,
getTimingDisplayData(CallTreeNodeID: CallTreeNodeID): Object,
getCallNodePathFromNodeIndex(
CallTreeNodeID: CallTreeNodeID | null
): CallNodePath,
getNodeIndexFromCallNodePath(
callNodePath: CallNodePath
): CallTreeNodeID | null,
getNodeIndicesFromCallNodePaths(
callNodePaths: Array<CallNodePath>
): Array<CallTreeNodeID | null>,
getAllDescendants(node: CallTreeNodeID): Set<CallTreeNodeID>,
getHeaviestStackUnderCallNodePath(callNodePath: CallNodePath): CallNodePath,
};
I feel like there are 2 things convoluted in one:
Can we discuss about the good and bad consequences of these 2 changes ?
Especially, can't we change the implementation and keep the architecture we have ? My biggest concern is that we have a lot of code depending on this architecture, and that do not depend on a CallTree object (the latest being the sidebar implementation).
I'm pretty sure that the second part is required for the first part to have acceptable performance. The main reason why this implementation improves performance is the fact that we don't compute all nodes in the inverted tree upfront. We only compute the nodes we need to display.
The implementation isn't fundamentally different. It's just more lazy. So you can't just plug it into the existing architecture, which relies on having a complete CallNodeTable and an integer index for each node in the inverted tree.
Oh, there's one other important difference, apart from the laziness: The inversion operates on the CallNodeTable instead of on the StackTable of the non-inverted tree.
It might be worth trying that part out on its own:
Instead of going regularStackTable -> invertedStackTable -> invertedCallNodeTable
,
go regularStackTable -> regularCallNodeTable -> invertedCallNodeTable
.
See #1078
We should also make it so that the thread graphs are drawn based on the non-inverted thread. At the moment, if you check the "Invert call tree" checkbox, we compute all inverted threads, not just the selected thread.
Thanks, I finally filed #3313 about these 2 last comments. This sounds like a good self-contained idea to try first.
I found a profile which is slow to invert even on my fast machine: https://share.firefox.dev/3yivFDy (and I've edited the first comment of this issue to link to it)
It might be worth trying that part out on its own: Instead of going
regularStackTable -> invertedStackTable -> invertedCallNodeTable
, goregularStackTable -> regularCallNodeTable -> invertedCallNodeTable
.
I gave this a try and it didn't help much. It cut down the inversion time from 40 seconds to 18 seconds on this profile, but we really want to go to 200ms or less. So I think some kind of lazy solution is necessary. I'll think about this some more.
It cut down the inversion time from 40 seconds to 18 seconds on this profile, but we really want to go to 200ms or less.
Haha, I set the goal pretty high here! But I'm happy to report that I achieved it: In the deploy preview in #4806, the profile above now inverts in 198ms. 🎉 I'll work on polishing up the patches and splitting them into reviewable pieces now.
Inverting this profile is pretty slow: https://share.firefox.dev/3yivFDy
Currently, we go from stacks to inverted stacks to funcStacks.
It might be better to go from stacks to funcStacks, i.e. have non-inverted funcStack info, and then do the inversion when constructing the tree.
┆Issue is synchronized with this Jira Task