expose cpuTime on compositor + mainframe events?

igrigorik commented 10 years ago

nduca@: the reason for cpuTime is because fps can't go faster than 60 typically [its device specific, and many devices actually dont support turning off vsync]. The discrete nature of fps means its a good smoke signal but very bad measure for catching slow creep regressions. If an onscroll or raf suddenly goes up by 2x but was 2ms to begin with, or worse someone adds css that increases raster time, or someone checks in splashscreen.jpeg in highres form, then you wont see those costs yet it'll affect smoothness. I wanted that to be reflected somehow in the api.

There's one final thing I want to bring up: descheduled events. For those not familiar with systrace or similar, give it a try... http://developer.android.com/tools/debugging/systrace.html. These tools show that when you use highres time, you're measuring "wall time" separate from "cpu time." When the kernel deschedules your thread for another thread, wall time keeps advancing. CPU time, on the other hand (clock_gettime with CLOCK_THREAD_CPUTIME_ID) stops advancing when your'e descheduled.

It turns out that you get much higher quality numbers when you use the cpu time. Hence why I said cpuTime in this. But maybe what we should be doing is figuring out how to get the performance timeline to include highResCpuTime .

Current proposal defines cpuTime on PerformanceMainFrameTiming but omits it on PerformanceCompositeTiming, with thinking that cpuTime is the total across all threads.

Instead, should we expose cpuTime on both events? That would make the interface more consistent and provides a more granular view into where the cpuTime is spent. New definition would be something like:

"The cpuTime attribute must return a [High Resolution Time] duration giving the total CPU time of the associated thread."

Nat: We shouldn't specify what threads a browser runs composite and mainFrame on. Its important that they're separate to compute compositor-to-main-frame skew. But, when it comes to talking about "cost", we must specify this as a global cost independent of threads.

Compositor draw events are instantaneous and just indicated that we did draw [or began composited drawing]. If they were heavy their cost should attribute themselves back to the main frame for which they are associated.

igrigorik commented 10 years ago

we don't want to imply that there is a compositor thread (e.g. we can run threadless)
mainframe should capture all CPU costs of that frame: we need to define what this should include to be representative across different browsers; "it should include the cost for.... x,y,z"

mattwoodrow commented 10 years ago

I'd expect that wall clock duration from the start of the frame interval (vsync) until we commit the frame to compositor, regardless of threads/scheduling, would be the most useful to web authors.

The reasoning for that is we have a fixed wallclock duration (16.7ms) for each frame, and getting smooth rendering requires you to complete your work in under this time. How many threads were utilised in parallel, how much time they spent waiting on I/O etc seem like secondary concerns (and things that are probably much more interesting to the browser developers).

We also have the issue where blink is trying to do content and compositing (for the same frame) within the same vsync interval (such that the combined wall clock time needs to be under 16ms), but gecko is doing compositing of a frame on the following vsync interval (so content and compositing only need to be under 16ms each). Given that, it might make sense to expose the amount of wallclock time remaining before a given frame misses the interval rather than (or maybe in addition to) the elapsed time.

mithro commented 10 years ago

@mattwoodrow From reading the spec and explainer, the point behind cpuTime seems to be as a proxy for "how much of the frame budget you are using". The idea being that as a web developer I can record this information and go

I'm only using 1ms of the budget so everything is awesome even on very slow devices, I can make my page much more expensive before I even come close to making the web page janky.

or

I'm using 15ms of the budget and even a small increase will cause jank problems and on slower devices I might already be doing that.

If this is the case, then I think the cpuTime value should include any work which is caused by the web developer? Can someone who actually wrote the specification chime in and confirm that my interpretation is correct (or did I miss something)?

mattwoodrow commented 10 years ago

@mithro Yes, that is exactly what I think we want.

The intial comment of this issue suggests that we'll be recording "cpu time", which wouldn't achieve this goal afaict (since it doesn't increment while we're descheduled or waiting for IO).

mithro commented 9 years ago

@mattwoodrow - Sorry about the slow reply, this dropped out of my inbox.

The problem using using a time value which increments while descheduled or waiting on IO is that the web developer has basically no control over this behaviour. Even the browser has only minor control over these issues because it all (mostly) happens up in the operating system kernel (as all modern OSs are preemptable).

What the developer does have a direct effect on, and hence control over, is the amount of CPU cycles the computer is burning when rendering their page.

Chrome has been recording this type of information for a while using a tool called telemetry. In doing so we discovered that trying to use real time was a nightmare. The values varied widely from device to device (even when the devices should be identical) and even just run to run! The data was just too noisy to provide any useful metrics about things like performance improvements or regressions. We eventually discovered that using CPU time gave us much better results and allowed us to track these type of metrics. @natduca is the tech lead on telemetry and can probably give you a much more detailed and nuanced explanation, backed with real world data, about this issue.

There might be other stats which are useful (maybe time spent waiting on IO?) but CPU time is provably valuable right now.

igrigorik commented 9 years ago

mainframe should capture all CPU costs of that frame: we need to define what this should include to be representative across different browsers; "it should include the cost for.... x,y,z"

Do we have a plausible list of what this should be? Layout, style recalculations, painting, ...?

mpb commented 9 years ago

I would guess the biggest contributions are JavaScript execution, applying CSS, and building/modifying the DOM, right?

On Nov 11, 2014, at 10:33 AM, Ilya Grigorik notifications@github.com wrote:

mainframe should capture all CPU costs of that frame: we need to define what this should include to be representative across different browsers; "it should include the cost for.... x,y,z"

Do we have a plausible list of what this should be? Layout, style recalculations, painting, ...?

— Reply to this email directly or view it on GitHub.

igrigorik commented 9 years ago

Hmm, how about something like...

When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on).

mpb commented 9 years ago

SGTM. Who else had an opinion on that one?

On Tue, Nov 11, 2014 at 11:07 AM, Ilya Grigorik notifications@github.com wrote:

Hmm, how about something like...

When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on).

— Reply to this email directly or view it on GitHub https://github.com/w3c/frame-timing/issues/1#issuecomment-62569394.

mithro commented 9 years ago

SGTM too. @natduca do you think this missing anything important?

When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on).

natduca commented 9 years ago

lgtm

yoavweiss commented 9 years ago

The problem using using a time value which increments while descheduled or waiting on IO is that the web developer has basically no control over this behaviour

@mithro - if the developer relies on synchronous APIs (e.g. localStorage) then on some machines I/O operations can take a large part of their frame budget if not more than that, and it is under their control (they can stop using these APIs). Exposing that info seems valuable.

mithro commented 9 years ago

I'm happy to go with whatever @natduca is happy with.

I did have a ponder if we should just be providing the same stats that OS provides (io wait, user time, cpu time) then realized nobody seems to understand them and it doesn't really work with the multiple threads that occur within a frame.

On 24 November 2014 at 20:34, Yoav Weiss notifications@github.com wrote:

The problem using using a time value which increments while descheduled or waiting on IO is that the web developer has basically no control over this behaviour

@mithro https://github.com/mithro - if the developer relies on synchronous APIs (e.g. localStorage) then on some machines I/O operations can take a large part of their frame budget if not more than that, and it is under their control (they can stop using these APIs). Exposing that info seems valuable.

— Reply to this email directly or view it on GitHub https://github.com/w3c/frame-timing/issues/1#issuecomment-64169432.

igrigorik commented 9 years ago

Ok, so I'm a bit confused... Looks like everyone is OK with this:

When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on).

When I wrote that, I was still thinking of cpuTime as "wall time" in my head. That said, based on the last few comments, it sounds like we don't want that. Correct? If so, we should make it explicit that descheduled events should not be part of cpuTime. Perhaps:

When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on). The user agent SHOULD NOT include time spent on descheduled events (e.g. processing another frame, I/O time, and so on).

However, I agree with @mattwoodrow that wall-time is also important. You may be within your cpuTime budget, but still run really darn slow due to an overloaded or slow system, etc. Perhaps we should be surfacing both?

When calculating the wallTime for a frame, the user agent SHOULD include the actual time spent processing the frame, which includes the cpuTime to process the frame and time for all other operations during that time.

WDYT?

igrigorik commented 9 years ago

(based on a video chat w/ Michael)...

cpuTime looks good.
wallTime is better handled via duration attribute (see #3), which we can update to capture the total time to process the frame, instead of defining it as time between startTime's of subsequent frames. This allows developers to (a) detect if they're exceeding the frame budget, and if so (b) by how much.. instead of just getting a 'previous frame was painted 33ms ago'.

mithro commented 9 years ago

This SGTM.

On 3 December 2014 at 06:03, Ilya Grigorik notifications@github.com wrote:

(based on a video chat w/ Michael)...

cpuTime looks good.

wallTime is better handled via duration attribute (see #3 https://github.com/w3c/frame-timing/issues/3), which we can update to capture the total time to process the frame, instead of defining it as time between startTime's of subsequent frames. This allows developers to (a) detect if they're exceeding the frame budget, and if so (b) by how much.. instead of just getting a 'previous frame was painted 33ms ago'.

— Reply to this email directly or view it on GitHub https://github.com/w3c/frame-timing/issues/1#issuecomment-65284798.

vmpstr commented 9 years ago

Sorry I'm a bit late to the party.

I think for cpuTime, we should use wall clock, which would include descheduled (in the OS sense) time, but exclude idle (in the chrome sense) time. The reason for this is that thread time also varies between different OSes (I don't think there's a standard way of getting that info). The other reason is that we'll end up reporting work done (even if we pack tasks without idle time). I think that information is very useful, regardless of whether the developer has control over it (at the very least, it acts as a signal for further investigation).

Furthermore, I believe we should be reporting this work done summed over all threads that do the work, which is something that I don't think we explicitly addressed yet (or I missed it).

Maybe this means that we should call this something other than cpuTime, since that has comp sci connotation of measuring only scheduled time (in OS sense).

As an aside, the difference with duration is that in duration we should also include idle time, which would end up measuring end to end how long the frame took.

mithro commented 9 years ago

As I commented in https://github.com/w3c/frame-timing/pull/22 , based on @igrigorik commented 13 days ago I took the following to be our agreement;

duration was going to be "wall time" to render a frame,
cpuTime was going to be "cpu time" to render a frame.
vsync interval is calculated by working on the difference between two start times.

I believe strongly we need a cpuTime in wall time is going to be too noisy to produce good data for the users. The primary reasons for this is the time you getting descheduled by the OS is of a similar order of magnitude to the time taken to render a frame (both are in milliseconds).

@natduca was the person who impressed on my the importance of cpuTime being actual processing used and not wall time taken. I understand that in his (very extensive) experience with telemetry and tracing that a wall time value has been demonstrably too noisy to be useful. I also understand this was the case even in highly controlled environments. @natduca - can you confirm that I'm not misinterpreting what you were saying?

In my own experience with measuring application performance on systems at Google before joining Chrome (not graphics rendering) I have also seen wall time based measurements containing huge amounts of noise.

I think looking at what I believe to be the two use cases for cpuTime;

a) A web developer understanding how "expensive" their website is (on their own machine, while testing). b) Collection of how your website performs on your user's machines.

Looking at case (a) I think;

A very good developer might get useful information by very carefully controlling the environment and averaging over a large sample size.
The average developer is going to be confused why every time they run their measure there is a huge standard deviation and it can comes up with different results.
Using the value is significantly more work than "add/remove stuff X, run locally on my machine and see how much more/less of the frame time is taken up".

Looking at case (b) the cpuTime value is going to be overwhelmed with noise, the same website, running on the same users machine is going to take widely different values depending on what else the user is doing. This means only the very largest websites which will have huge data sets to average over will have any hope of getting useful data here.

mithro commented 9 years ago

(Just finishing linking the two issues together.)

I believe the discussion around duration here seems conflicts with the discussion on duration in #3

mpb commented 9 years ago

I spoke with @natduca, and he was very much in support of getting "cpu time" information into the performance timeline in various ways. I'll let him comment, but I believe he wanted to do it in a more holistic sense, not just in Frame Timing. I think his intention is that once everyone agrees on what standard cpu timing information to carry over into the platform, it would get added in multiple places and APIs. I don't think he wanted to make Frame Timing block on that discussion though.

The thought behind using a wall-clock version of cpuTime for Frame Timing is that if you wanted actual detailed measurements of each step of your pipeline, you would use Chrome's about:tracing or IE's Developer Tools or something similar. Frame Timing should be capturing what the real-world users are experiencing.

That said, if wall clock is just too noisy to be useful, maybe it's not helpful?

I'm trying to imagine how I would use this as a web developer. I would think I really just want to know A) Am I hitting my frame deadlines for a smooth page, or am I missing and janking? B) How much room do I have for extra work (or, if I'm janking, how much am I missing deadlines by)?

If you only have 5ms of work, but because of system load it takes 10ms to run (for example) then you don't have nearly as much spare capacity as a straight CPU cycles measurement would suggest and the 10ms number is more relevant and interesting, in my opinion.

Feel free to disagree though! @mithro - do you have a different use case in mind?

Thanks, -Mike

igrigorik commented 9 years ago

Let's continue the discussion in #22; closing.

natduca commented 9 years ago

I agree we need "cpu timing" but it is probably something we should add to the entire performance timeline ecosystem, instead of something just for frame timing spec. Eg we can defer cpu timing for a separate spec. For now lets focus on precise definitions of the main frame events, precise definitions of duration versus cost, but actually report cost in terms of wall-time-cost.

WICG / frame-timing

expose cpuTime on compositor + mainframe events? #1