Nested async sections display improperly

scheibo commented 9 years ago

Example trace: https://gist.github.com/scheibo/87a9025bb44c98a1bb0c

example

createAsyncSlices_ is doing the wrong thing in this case (results in several import errors and butches the visualization), I would expect instead for the result to be one of:

a) xmfykgzqzvk

In this example, Child 1 and Child 2 are children of Root and Grand Child is a child of Child 1. This is the 'intended' structure, but is: i) impossible for trace-viewer to infer from just the timestamps (it is encoded instead in a custom _parent_id that trace-viewer doesn't know to look at) ii) looks kind of like garbage due to the overlap. Even better would be if Child 2 were not forced to overlap with Child 1 and instead just showed up in a separate row, like

[ .......................Root....................... ]
      [ ............Child 1............ ]
                      [ Grand Child ]
             [ ............Child 2............ ]

(Of course, with that visualization it would not be clear that Child 2 is a child of Root and not Child 1 or Grand Child).

b) mvjrcugz5gz

Which is achieved by removing the:

   while (parentIndex < parentStack.length) {
     parentStack.pop();
   }

from line 985 (though simply making that change would probably have performance and correctness issues in the general case). This visualization doesn't have the 'intended' parenting relationship, but is very readable.

Either way, the current display method is considerably worse than either of these two options when displaying complicated nested trees - some reworking of trace_event_importer lines 965-995 seems like it would help resolve this issue.

dj2 commented 9 years ago

@philippebeaudoin this seems like it would be a use case for the depth parameter you were thinking about for slices.

philippebeaudoin commented 9 years ago

Looking at the trace, your "chlid 1" slice is [6000, 8000] and "child 2" is [6000, 9000].

The entire hierarchy of slices is determined via time stamps. If the interval of S2 is entirely included in the interval fo S1 than we make S2 a child of S1. When two interval overlaps but one is not nested in the other I believe we go into "undefined behavior". (Only true for sync slices, not AsyncSlices.)
I don't think we could solve this problem without an explicit graph. The "depth" I proposed only works if slices never happen in "parallel" on a given thread. Otherwise two slices could happen at the same depth at the same time and it would be impossible to know how to parent them.

It's a deeper question whether we want to support slices happening in parallel on a given Thread/SliceGroup. IMHO SliceGroup/Slices are not the right place to do these. I would like to understand better what you're trying to do here and how you end up with this. Maybe AsyncSlice is what you need here?

scheibo commented 9 years ago

I knew that for duration events the hierarchy depended on non overlapping sections, but the example trace uses nested async events where it is not explicit that they shouldn't overlap. I would have imagined that nested async events would map to AsyncSlices?

dj2 commented 9 years ago

If memory serves, even with async we assume the call stack isn't overlapping. This is because we used async to follow the flow of things through the system. It's not possible to send a message and have it processed before it arrives.

Although, I maybe wrong in that regard, it's been a while since I looked at the async code.

scheibo commented 9 years ago

What if you send a number of messages and they can be processed at arbitrary times after some delay (for example, queuing with priorities)? With full access to instrumenting all of the layers it would probably be best to use flow + duration events, but if you have to blackbox the queuing delay, async events are the only option, and they can definitely end up overlapping when messages are address out of order. ie/

all =  tracing.startAsync("Submitting")
messages.each do |message|
   message.section = tracing.startAsync("Submitting #{message.id}")
   # callback gets fired when the message is done being processed
   submit(message, callback=lambda m: m.section.end())

# assume we have something which gets fired when all submitted messages are complete
whenAllComplete(messages, lambda : all.end())

Perhaps a bad example because I think it might still be possible to use flow + duration events to capture this better, but I guess my point is that we're using async sections when we're unable to easily use flow+duration because we don't actually know when code is executing or whether it is instead just waiting on I/O / network / queuing. Its still useful to get the 'broad strokes' in an async event so we can measure high level actions, and it would definitely be better if we could break it down to be more exact in the future, but to transition from our current instrumentation system to one which can be visualized with trace-viewer relaxing the overlapping requirements on async events would be useful.

catapult-project / catapult

Nested async sections display improperly #1452