Bonus points: Compress extremely large profiles so they have a fixed upper limit

brendanheywood commented 3 years ago

Lets say we have an extremely long profile, maybe an integration web service or something that runs on the cli and a very long report which hopefully streams. The larger the profile is, the less we care about the fine details. The bottlebeck will be the same either way. Let’s say we record cron or a report that take 4 hours.

So the idea is that we can arbitraily set some maximum size for the profile itself, and if larger profiles come in the we can reduce the sampling period. I can see two ways this might happen:

1) can we tell excimer to change the sampling frequency after it has started running? So for instance after 1 minute we double the sample rate. After 2 minutes we double again. After 4 minutes we double.

2) Even if we can't, we could just selectively ignore some of the callbacks.

3) After the fact, before we persist the profile, can we do some time slicing and reduce the resolution of the profile data?

Question: If a profile has variable sized time slices with the flame graph just work because it’s know each time slice?

jaypha commented 2 years ago

Yes. Calling ExcimerProfiler::start() will restart the timer using any new settings given. It should keep any existing logs.
Each ExcimerLogEntry keeps the time of the sample, so we can filter out entries as they are processed.

Answer to Question: The flame graph should still display, but no time data is kept at this point, so the widths of the nodes will not vary according to time.

jaypha commented 2 years ago

I have hand tested the half-life sampling and believe it to be working. The sampling period gets progressively longer each time the partial save period is triggered. I have seen it working even when the sampling period exceeds the partial save period. It remains to be seen how it would work on a truly long running profile.

I have not been able to figure out how to make a unit test for it.

This has not been implemented yet for cron tasks. They still have flat timing.

As I mentioned above, flame graph data does not contain time information. Each node contains only a name and a number. For excimer, it is the name of the function and the number of times it has been called. It is not even in chronological order. Therefore the flame data graph cannot 'stretch' as the time interval changes. It has no knowledge of such.

brendanheywood commented 2 years ago

Comments are based on just the comment above, not reading code, so might be out of date:

I have seen it working even when the sampling period exceeds the partial save period.

Maybe lets tweak that so if it exceeds it then the partial save goes up as well. Ideally the partial save period should be some multiple of the sample period, ie we save every 200 samples.

In fact it probably makes more sense to convert long_interval_s to be measures in samples rather than seconds to make this simpler. Can you see a downside to this?

This has not been implemented yet for cron tasks.

We need to fix this, cron is one of the worst offenders here and some tasks can take 10+ hours to run, sometimes days

Each node contains only a name and a number.

I still think this is all we need and we can workaround it. I can see two potential workaround, both will have their pros and cons:

a) lets say we've recorded 50 samples worth of data at 100ms a sample and now we ratchet up to 200ms. Before we start adding on more data, we go through the first 50 and reduce it to 25 samples. If we were in a function for 20 samples then it would now only say 10. If we have sample a>b>c for 1 sample adjacent to sample a>b>d then we'd just pick the first one and throw away the second. There will be some rounding errors to deal with but this should be exactly the same end result to if we'd just started with a sample size of 200ms

b) we do more or less the opposite of the above, instead of shrinking the samples in the past, we expand the samples in the future. But we can only do it after the fact. So lets say we sampled 100 samples @ 100ms = 10 seconds realtime, and then we tack on another 100 samples @ 200ms = 20 seconds, then another 100 second @ 400ms = 40 seconds for a total of 70 seconds. If we know / record the points in time where the sampling changed then we can go through all the calls 100-200 (10-30 seconds) and double their numbers, and all the calls from 200-300 (30-70 seconds) and x4 and so on. This has the benefit of not throwing away any data at the start of the profile. But it will also mean the graph starts finer grained and gets grainier. I don't think the data at the start will be any more valuable than any other arbitrary part of the profile.

If forced to pick I'd go with option a) because I suspect the logic will be simpler, and it will mean that we are culling the data as we go and so the in memory footprint will be constant size. Doing it the second way will have a footprint of O(log(n)) but even this will be totally fine anyway. Another way of expressing it is that we will never have move than 2**N samples for some fixed value of N. Realistically when looking at the graph on a big monitor you are never going to meaningfully display more than 1024 samples and most of the time it will be less than that. Yes we can store more and drill in but when looking for a bottle neck you probably only need double digits to get a pointer at where the issue lies. So I don't think we'd ever want to store more than say 2048 or samples in a profile.

It is not even in chronological order.

That feels like it can't be possible? Are you talking completely out of order or just some small glitches here and there?

brendanheywood commented 2 years ago

Consensus lets aim for 1024 samples as the default

jaypha commented 2 years ago

Hi

There is an issue with applying the increasing sample period to cron tasks. When a long task comes to an end, the sampling period can be so long that a number of following tasks would be skipped entirely before the chance to reset the sample rate occurs.

An alternative would be to keep the sampling rate, but only record every nth sample according to the double-rate. Then when the task has finished, the code can then immediately go back to the beginning, and not miss the 'smaller' ones.

The half-life would need to be an integer multiple of the sample rate for this to work properly.

I think it would be neater this way anyway. What do you think?

And a clarification. Does 1024 mean it stops recording once it reaches 1024 samples?

brendanheywood commented 2 years ago

I'm happy to keep sampling but only process them on every nth sample. With the caveat that the logic done while sampling but not processing must be very light, eg no cache access and no db access, just a bit of cpu

And a clarification. Does 1024 mean it stops recording once it reaches 1024 samples?

No, I meant when it got to 1024 (or what ever it was set at) then it would go through and throw compress the current samples down to 512 samples, then double the sampling rate, and then it would slowly wind its way up to 1024 again and rinse and repeat

jaypha commented 2 years ago

So 1024 is the half life?

brendanheywood commented 2 years ago

1024 is the maximum samples we will ever store for a given profile. When we get to that value we will double the value of sample_ms and long_interval_s in working memory

The term half life is probably not the most accurate label as firstly it should be a time based unit which was one of the earlier contenders for how this might be configured. Lets just call it a maximum sample size and in the description describe how it ramps up the other values

catalyst / moodle-tool_excimer

Bonus points: Compress extremely large profiles so they have a fixed upper limit #24