Open JCash opened 2 years ago
It does support this but requires the thread sampler to be active in order to achieve this. It does so by:
ThreadProfiler
for each thread.CheckForStallingSamples
).This only works on Windows as it requires implementations for:
rmtGetCurrentThreadId
rmtSuspendThread
rmtResumeThread
It is currently intermingled with the thread sampler, which you do not need for this stalling functionality. So two things would need to be done to get this working on other platforms:
Worth mentioning that suspending the thread is not the most ideal way to check for in-progress sample trees. It's just the most convenient place for the check right now. The best solution would be a custom thread that wakes up every 30ms or so, checks for stalling trees and somehow copies them safely for communication as a "partial tree."
The current variable treeBeingModified
is only safe to use because the thread that uses it is suspended. It's not a generic synchronisation primitive.
Thanks! That's good to know, and it does indeed sound a bit complex.
I've mentioned another approach earlier, by "manually" adding info to remotery (if possible) (See https://github.com/Celtoys/Remotery/issues/193).
E.g. sending a start event, with name+time+threadid, and then later on, sending the corresponding end event.
How feasible would that be do you think? (Or does it have the same restrictions you outlined above?)
Right now Remotery builds a sample tree on the thread being sampled and only sends the complete sample tree to the viewer/log once the root sample is closed.
The question you need to ask is: are there any other sensible periods when sending that tree is possible? On the thread being instrumented, the only entry points we have are when you push and pop samples. In this case you can't use Pop because the long-running event hasn't been closed yet, so you must use Push. And you can't really send the entire tree on each Push as traffic would be huge. So you need some kind of stochastic model which wouldn't always give you the results you want.
Right now, on Windows, this works well but it's not the solution I want because it's not cross-platform and interferes with the debugger. I see the following possible options, with increasing complexity:
I like 4 the best but it relies on Javascript performance building the tree. I'm in the middle of a viewer performance overhaul and can't effectively measure the impact of 4 yet, but it will become clear soon. I expect both Chrome/Firefox to do well optimising the hot paths but I can't be sure.
Also with number 4, there is no sample tree on the server anymore so the Sample Tree API will break. Number 3 wouldn't have this issue.
Number 2 is probably the easiest win right now as I don't believe 1 possible on all platforms.
I have plans to do 3 or 4 so that Remotery can be used for micro-benchmarking as well as the stuff we use it for. When I get around to that is another issue.
An option for forks could be: add an API function that manually checks for stalling samples on the thread the function is called from and send the partial tree if it's stalling. Very low-tech but would get you what you want with minimal changes.
Thanks again for your full answers! It sounds like number 4 is close to what I had in mind as well.
Actually, just to confirm, because long-running events are supported on all platforms: is this question just about that? Because I'm answering the question about long-running events not showing up in the viewer until they complete.
You just rmt_BeginCPUSample
and rmt_EndCPUSample
as normal, ensuring they are the root of their trees. They will get sent when they close.
Another option that would work for you: we add a new sample flag, RMTSF_SendToViewer
(or something), that sends the partial tree to the viewer when you close that sample. That way you can apply this to your loading samples and get them sent to the viewer on close, without having to make them root samples.
Actually, just to confirm, because long-running events are supported on all platforms: is this question just about that?
I think so yes.
You just rmt_BeginCPUSample and rmt_EndCPUSample as normal, ensuring they are the root of their trees. They will get sent when they close.
I'm not entirely sure what I would need to do here? Code-wise, i'm in the middle of the callstack, with multiple rmt_BeginCPUSample's active.
I might be able to sort something out...
Can you try fadba34b975ddf23e739c16e7c1100dbf870c6de? I haven't tested it.
Mark whatever loading samples you have as RMTSF_SendOnClose
and they should get sent. The flag comments are important for extra context.
Thanks! I tried it but I couldn't make it work (perhaps I did it wrong):
In its simplest form, it crashes due to the recursive depth (87000) in the SampleTree_CopySample function.
rmt_BeginCPUSample(TestName, RMTSF_SendOnClose);
dmTime::Sleep(10000);
rmt_EndCPUSample();
Also, I think I need to clarify further what I'm after code wise. A little bit of pseudo code:
void UpdatePendingResources(Context* ctx)
{
rmt_BeginCPUSample(__FUNCTION__, RMTSF_Aggregate);
for (item in ctx->NewWork()) // 0..x number of items
{
// start the profile scope for each unique resource ("file1.txt", "file2.txt" etc)
uint32_t profile_scope_id = rmt_BeginLongSample(item->Path(), some_flags);
item->StartWork(profile_scope_id);
}
for (item in ctx->ProcessWork())
{
// 0..x number of items will be done per frame
// Closing the scopes will not be done in the same order as they were started.
if (item->IsDone())
{
// close the profile scope of this item
uint32_t profile_scope_id = item->GetProfileScope();
rmt_EndLongSample(profile_scope_id);
}
}
rmt_EndCPUSample();
}
Oh, yes... I'm not sure how I'd even render that kind of information unless each resource had its own timeline as everything overlaps. You'd get situations where some resources would completely hide other resources underneath it. The implementation is certainly not set up right now as well to close samples out of order.
You'd need something like this
Yeah, I was a bit uncertain if there was some support for it. Sorry for taking up your time.
I'd chalk this down to being a feature request then :)
I will remove the new flag for now.
I wish to be able to log events that span multiple frames. e.g. the loading of a resource in the game.
Example: https://i.stack.imgur.com/odD1S.png
By the looks of it in the Remotery example image (int the Readme.md), the "Processor timeline" is something related to that, but I'm not sure?
Is this something that is currently supported? If not, what would be the recommended way to do it (assuming it fits within the scope/vison of Remotery)
/Mathias