akamai / boomerang

End user oriented web performance testing and beaconing
http://akamai.github.io/boomerang/
Other
1.86k stars 292 forks source link

Possible bug: TTI timeline calculation sometimes starts early, reports early result #315

Closed scottgifford closed 3 years ago

scottgifford commented 3 years ago

Hello, two related issues we came across while deep-diving Boomerang on our site.

First, when Boomerang calculates what bucket to start in, if Boomerang was initialized after the page was visually ready, timeline analysis will start in a negative bucket; I see this happening quite frequently when I look at this in the debugger (in my last test it started at frame -33). Looking at this code, it will happen if startTime is greater than visuallyReady: https://github.com/akamai/boomerang/blob/master/plugins/continuity.js#L1671

            var startBucket = Math.floor((visuallyReady - startTime) / COLLECTION_INTERVAL);

A saving grace is that the negative time intervals will not be considered idle as long as the FPS timeline exists: https://github.com/akamai/boomerang/blob/master/plugins/continuity.js#L1680-L1684

                if (data.fps && (!data.fps[j] || data.fps[j] < TIME_TO_INTERACTIVE_MIN_FPS_PER_INTERVAL)) {
                    // No FPS or less than 20 FPS during this interval
                    idleIntervals = 0;
                    continue;
                }

The FPS timeline usually does exist by the time this starts, but occasionally it does not in my tests, and will result the timeline being ignored and visuallyReady being used as the only TTI indicator.

Second, it seems like Boomerang often (always?) reports the timeline-based TTI as 100ms too early, because of what looks to me like an off-by-one error.

In this overal loop: https://github.com/akamai/boomerang/blob/master/plugins/continuity.js#L1673-L1710

We do this check, starting on line 1703:

                if (idleIntervals >= TIME_TO_INTERACTIVE_IDLE_INTERVALS) {
                    tti = startTime + ((j - TIME_TO_INTERACTIVE_IDLE_INTERVALS) * COLLECTION_INTERVAL);

                    // ensure we don't set TTI before TTVR
                    tti = Math.max(tti, visuallyReady);
                    break;
                }

It is easiest to illustrate the problem with an example. If the timeline was idle in all of the first 5 buckets, then by the 5th bucket (j=4), on line 1703, idleIntervals will be 5 (we incremented it above, on line 1699). We will find that idleIntervals >= TIME_TO_INTERACTIVE_IDLE_INTERVALS is true, so will begin calculating the TTI, but since j is still 4, (j - TIME_TO_INTERACTIVE_IDLE_INTERVALS) will be -1, ((j - TIME_TO_INTERACTIVE_IDLE_INTERVALS) * COLLECTION_INTERVAL) will be -100ms, and so tti will be 100ms before startTime. We see this frequently in our own calculations, and I can see it happening in the debugger.

I think the problem is more general than just when the first 5 buckets are all idle; it seems to me that idleIntervals is always 1 frame ahead of j right after it is incremented, so the result will always be 100ms too early.

I wanted to see if either of these were happening on purpose, or if they are just bugs in the calculation?

Thanks!

----Scott.

scottgifford commented 3 years ago

FYI we have decided for our own site this is a bug and have fixed it to never look in negative buckets, and to avoid the off-by-one error with idleIntervals.

mc-chaos commented 3 years ago

Hi Scott, did you fork the boomerang repo for your changes? Would you create a PR for your changes to get them upstream? Regards, Sascha

scottgifford commented 3 years ago

Hi Sascha, we have not done that yet, I will have to go through a little bit of process at my work to release our changes back to open source, and wanted to get an idea of whether the upstream agreed they were appropriate and were reasonably likely to be merged before I did that.

nicjansma commented 3 years ago

@scottgifford apologies for the delay in reviewing your notes, I think you're spot-on.

In general, we always love PRs from the community and are appreciative of you digging into this issue.

I've opened a PR with (I think) your proposed changes: https://github.com/akamai/boomerang/pull/326

I've refactored the code a bit so that we can unit test it a bit better.

If you wouldn't mind looking them over, and letting me know if there are any other issues you've fixed, or if there's something I've missed?

Thanks!

scottgifford commented 3 years ago

Thanks Nic, took a look at the PR and it looks good to me!