Celtoys / Remotery

Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer
Apache License 2.0
3.03k stars 255 forks source link

Q: Generic long running events? #213

Open JCash opened 2 years ago

JCash commented 2 years ago

I wish to be able to log events that span multiple frames. e.g. the loading of a resource in the game.

Example: https://i.stack.imgur.com/odD1S.png

By the looks of it in the Remotery example image (int the Readme.md), the "Processor timeline" is something related to that, but I'm not sure?

Is this something that is currently supported? If not, what would be the recommended way to do it (assuming it fits within the scope/vison of Remotery)

/Mathias

dwilliamson commented 2 years ago

It does support this but requires the thread sampler to be active in order to achieve this. It does so by:

This only works on Windows as it requires implementations for:

It is currently intermingled with the thread sampler, which you do not need for this stalling functionality. So two things would need to be done to get this working on other platforms:

dwilliamson commented 2 years ago

Worth mentioning that suspending the thread is not the most ideal way to check for in-progress sample trees. It's just the most convenient place for the check right now. The best solution would be a custom thread that wakes up every 30ms or so, checks for stalling trees and somehow copies them safely for communication as a "partial tree."

The current variable treeBeingModified is only safe to use because the thread that uses it is suspended. It's not a generic synchronisation primitive.

JCash commented 2 years ago

Thanks! That's good to know, and it does indeed sound a bit complex.

I've mentioned another approach earlier, by "manually" adding info to remotery (if possible) (See https://github.com/Celtoys/Remotery/issues/193).

E.g. sending a start event, with name+time+threadid, and then later on, sending the corresponding end event.

How feasible would that be do you think? (Or does it have the same restrictions you outlined above?)

dwilliamson commented 2 years ago

Right now Remotery builds a sample tree on the thread being sampled and only sends the complete sample tree to the viewer/log once the root sample is closed.

The question you need to ask is: are there any other sensible periods when sending that tree is possible? On the thread being instrumented, the only entry points we have are when you push and pop samples. In this case you can't use Pop because the long-running event hasn't been closed yet, so you must use Push. And you can't really send the entire tree on each Push as traffic would be huge. So you need some kind of stochastic model which wouldn't always give you the results you want.

Right now, on Windows, this works well but it's not the solution I want because it's not cross-platform and interferes with the debugger. I see the following possible options, with increasing complexity:

  1. Port the thread suspend stalling checks to new platforms.
  2. Add a new solution that uses a custom thread to check sampling status in a lockless manner and copy trees.
    • Mostly easy as the tree is only ever added to, but has corner cases with aggregate/recursive samples.
  3. Make instrumentation only drop events into a queue and not construct a tree on-the-fly.
    • Remotery thread would be responsible for building the tree before sending.
    • The Remotery thread would be able to check its active thread queues each time it wakes up for stalling samples.
  4. Make instrumentation only drop events into a queue and not construct a tree on-the-fly.
    • Events are streamed untouched to the viewer.
    • The viewer is then responsible for building the tree.
    • No need for a stalling sample checker as events are streamed live.

I like 4 the best but it relies on Javascript performance building the tree. I'm in the middle of a viewer performance overhaul and can't effectively measure the impact of 4 yet, but it will become clear soon. I expect both Chrome/Firefox to do well optimising the hot paths but I can't be sure.

Also with number 4, there is no sample tree on the server anymore so the Sample Tree API will break. Number 3 wouldn't have this issue.

Number 2 is probably the easiest win right now as I don't believe 1 possible on all platforms.

dwilliamson commented 2 years ago

I have plans to do 3 or 4 so that Remotery can be used for micro-benchmarking as well as the stuff we use it for. When I get around to that is another issue.

dwilliamson commented 2 years ago

An option for forks could be: add an API function that manually checks for stalling samples on the thread the function is called from and send the partial tree if it's stalling. Very low-tech but would get you what you want with minimal changes.

JCash commented 2 years ago

Thanks again for your full answers! It sounds like number 4 is close to what I had in mind as well.

dwilliamson commented 2 years ago

Actually, just to confirm, because long-running events are supported on all platforms: is this question just about that? Because I'm answering the question about long-running events not showing up in the viewer until they complete.

You just rmt_BeginCPUSample and rmt_EndCPUSample as normal, ensuring they are the root of their trees. They will get sent when they close.

dwilliamson commented 2 years ago

Another option that would work for you: we add a new sample flag, RMTSF_SendToViewer (or something), that sends the partial tree to the viewer when you close that sample. That way you can apply this to your loading samples and get them sent to the viewer on close, without having to make them root samples.

JCash commented 2 years ago

Actually, just to confirm, because long-running events are supported on all platforms: is this question just about that?

I think so yes.

You just rmt_BeginCPUSample and rmt_EndCPUSample as normal, ensuring they are the root of their trees. They will get sent when they close.

I'm not entirely sure what I would need to do here? Code-wise, i'm in the middle of the callstack, with multiple rmt_BeginCPUSample's active.

dwilliamson commented 2 years ago

I might be able to sort something out...

dwilliamson commented 2 years ago

Can you try fadba34b975ddf23e739c16e7c1100dbf870c6de? I haven't tested it.

Mark whatever loading samples you have as RMTSF_SendOnClose and they should get sent. The flag comments are important for extra context.

JCash commented 2 years ago

Thanks! I tried it but I couldn't make it work (perhaps I did it wrong):

In its simplest form, it crashes due to the recursive depth (87000) in the SampleTree_CopySample function.

rmt_BeginCPUSample(TestName, RMTSF_SendOnClose);
    dmTime::Sleep(10000);
rmt_EndCPUSample();

Also, I think I need to clarify further what I'm after code wise. A little bit of pseudo code:

void UpdatePendingResources(Context* ctx)
{
    rmt_BeginCPUSample(__FUNCTION__, RMTSF_Aggregate);

    for (item in ctx->NewWork()) // 0..x number of items
    {
        // start the profile scope for each unique resource ("file1.txt", "file2.txt" etc)
        uint32_t profile_scope_id = rmt_BeginLongSample(item->Path(), some_flags);
        item->StartWork(profile_scope_id);
    }

    for (item in ctx->ProcessWork())
    {
        // 0..x number of items will be done per frame
        // Closing the scopes will not be done in the same order as they were started.

        if (item->IsDone())
        {
            // close the profile scope of this item
            uint32_t profile_scope_id = item->GetProfileScope();
            rmt_EndLongSample(profile_scope_id);
        }
    }

    rmt_EndCPUSample();
}
dwilliamson commented 2 years ago

Oh, yes... I'm not sure how I'd even render that kind of information unless each resource had its own timeline as everything overlaps. You'd get situations where some resources would completely hide other resources underneath it. The implementation is certainly not set up right now as well to close samples out of order.

dwilliamson commented 2 years ago

You'd need something like this image

JCash commented 2 years ago

Yeah, I was a bit uncertain if there was some support for it. Sorry for taking up your time.

I'd chalk this down to being a feature request then :)

dwilliamson commented 2 years ago

I will remove the new flag for now.