Open malytomas opened 5 years ago
In order to minimise performance overhead and memory footprint - Optick stores only timestamp and thread ID for each tag. Then in the GUI it searches for the most "bottom" function for this timestamp. Sometimes due to the OS timer resolution step timestamp for a tag from the parent function ("pathfinding") could match the start timestamp of the next function ("AStar") in your case - in this case Optick doesn't know which exactly function produced this tag. Nothing to worry here, it's not a race condition. But I'll see what I could do to get more consistent and less confusing tag distribution.
Thanks for the explanation. I thought that you have a thread local stack of events and just associate the tag with the event on top of the stack. Is TLS access that much more expensive? I have never measured it so I am asking purely out of curiosity. The low overhead is really awesome and some occasionally misassigned tags are probably not worth any complications.
Edit: Basically, I am comparing few TLS accesses with acquiring timestamp and thread id.
"ThreadID" is acquired automatically as Optick accumulates events and tags into per-thread TLS storage to avoid any synchronization primitives, so we get ThreadID pretty much for free. There is also no dedicated "stack" of events. In order to get the maximum throughput Optick accumulates all the events directly into a linear buffer. Then on the GUI side all the events are sorted in a special way and organized into a tree based on start/stop timestamps. So currently there is no cheap way to get "current event" for the tag.
To answer your question: which one is faster? 1) TLS access is super-fast nowadays. I'd estimate the cost as one-two potential L1/L2 cache misses. Should be safe to assume that the cost will be around ~38 cycles taking in account usage pattern. 2) Acquiring timestmap is super-fast as well. Modern CPUs use a special TSC hardware counters, so getting a timestamp will cost you "order of 10s or 100s of machine cycles, depending on the processor type" (see link below).
As you can see these two options are very close in terms of performance. Both are pretty fast.
Here is a couple of links if you want a bit more details about QPC (https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps) and TLS (http://david-grs.github.io/tls_performance_overhead_cost_linux/).
Memory savings here are more improtant than performance savings though. With just a timestamp we could store a linear array of tags of the fixed size witohut keeping any pointers to the event buffer. This separation is also a necessary step for the upcoming "live mode" where Optick will use available memory as a ring-buffer for events overriding old data with new. The less depepdencies between tags and events - the better.
In any case, I feel like I should be able to find a solution based on the timestamps for this problem.
In the picture below, the tag pathCost is incorrectly associated with an event that is already out of scope.
Pseudocode:
Tested with commit tagged 1.2.5.0.
It took several attempts to make it happen, so I guess it is some sort of a race condition?