Closed igrigorik closed 9 years ago
I'd expect that wall clock duration from the start of the frame interval (vsync) until we commit the frame to compositor, regardless of threads/scheduling, would be the most useful to web authors.
The reasoning for that is we have a fixed wallclock duration (16.7ms) for each frame, and getting smooth rendering requires you to complete your work in under this time. How many threads were utilised in parallel, how much time they spent waiting on I/O etc seem like secondary concerns (and things that are probably much more interesting to the browser developers).
We also have the issue where blink is trying to do content and compositing (for the same frame) within the same vsync interval (such that the combined wall clock time needs to be under 16ms), but gecko is doing compositing of a frame on the following vsync interval (so content and compositing only need to be under 16ms each). Given that, it might make sense to expose the amount of wallclock time remaining before a given frame misses the interval rather than (or maybe in addition to) the elapsed time.
@mattwoodrow From reading the spec and explainer, the point behind cpuTime seems to be as a proxy for "how much of the frame budget you are using". The idea being that as a web developer I can record this information and go
I'm only using 1ms of the budget so everything is awesome even on very slow devices, I can make my page much more expensive before I even come close to making the web page janky.
or
I'm using 15ms of the budget and even a small increase will cause jank problems and on slower devices I might already be doing that.
If this is the case, then I think the cpuTime value should include any work which is caused by the web developer? Can someone who actually wrote the specification chime in and confirm that my interpretation is correct (or did I miss something)?
@mithro Yes, that is exactly what I think we want.
The intial comment of this issue suggests that we'll be recording "cpu time", which wouldn't achieve this goal afaict (since it doesn't increment while we're descheduled or waiting for IO).
@mattwoodrow - Sorry about the slow reply, this dropped out of my inbox.
The problem using using a time value which increments while descheduled or waiting on IO is that the web developer has basically no control over this behaviour. Even the browser has only minor control over these issues because it all (mostly) happens up in the operating system kernel (as all modern OSs are preemptable).
What the developer does have a direct effect on, and hence control over, is the amount of CPU cycles the computer is burning when rendering their page.
Chrome has been recording this type of information for a while using a tool called telemetry. In doing so we discovered that trying to use real time was a nightmare. The values varied widely from device to device (even when the devices should be identical) and even just run to run! The data was just too noisy to provide any useful metrics about things like performance improvements or regressions. We eventually discovered that using CPU time gave us much better results and allowed us to track these type of metrics. @natduca is the tech lead on telemetry and can probably give you a much more detailed and nuanced explanation, backed with real world data, about this issue.
There might be other stats which are useful (maybe time spent waiting on IO?) but CPU time is provably valuable right now.
mainframe should capture all CPU costs of that frame: we need to define what this should include to be representative across different browsers; "it should include the cost for.... x,y,z"
Do we have a plausible list of what this should be? Layout, style recalculations, painting, ...?
I would guess the biggest contributions are JavaScript execution, applying CSS, and building/modifying the DOM, right?
On Nov 11, 2014, at 10:33 AM, Ilya Grigorik notifications@github.com wrote:
mainframe should capture all CPU costs of that frame: we need to define what this should include to be representative across different browsers; "it should include the cost for.... x,y,z"
Do we have a plausible list of what this should be? Layout, style recalculations, painting, ...?
— Reply to this email directly or view it on GitHub.
Hmm, how about something like...
When calculating the cpuTime
for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on).
SGTM. Who else had an opinion on that one?
On Tue, Nov 11, 2014 at 11:07 AM, Ilya Grigorik notifications@github.com wrote:
Hmm, how about something like...
When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on).
— Reply to this email directly or view it on GitHub https://github.com/w3c/frame-timing/issues/1#issuecomment-62569394.
SGTM too. @natduca do you think this missing anything important?
When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on).
lgtm
The problem using using a time value which increments while descheduled or waiting on IO is that the web developer has basically no control over this behaviour
@mithro - if the developer relies on synchronous APIs (e.g. localStorage) then on some machines I/O operations can take a large part of their frame budget if not more than that, and it is under their control (they can stop using these APIs). Exposing that info seems valuable.
I'm happy to go with whatever @natduca is happy with.
I did have a ponder if we should just be providing the same stats that OS provides (io wait, user time, cpu time) then realized nobody seems to understand them and it doesn't really work with the multiple threads that occur within a frame.
On 24 November 2014 at 20:34, Yoav Weiss notifications@github.com wrote:
The problem using using a time value which increments while descheduled or waiting on IO is that the web developer has basically no control over this behaviour
@mithro https://github.com/mithro - if the developer relies on synchronous APIs (e.g. localStorage) then on some machines I/O operations can take a large part of their frame budget if not more than that, and it is under their control (they can stop using these APIs). Exposing that info seems valuable.
— Reply to this email directly or view it on GitHub https://github.com/w3c/frame-timing/issues/1#issuecomment-64169432.
Ok, so I'm a bit confused... Looks like everyone is OK with this:
When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on).
When I wrote that, I was still thinking of cpuTime as "wall time" in my head. That said, based on the last few comments, it sounds like we don't want that. Correct? If so, we should make it explicit that descheduled events should not be part of cpuTime. Perhaps:
When calculating the cpuTime for a frame, the user agent SHOULD include the time spent on JavaScript execution, accessing and modifying the DOM and CSSOM, and other relevant processing operations required to render the frame (e.g. layout, style recalculations, painting, and so on). The user agent SHOULD NOT include time spent on descheduled events (e.g. processing another frame, I/O time, and so on).
However, I agree with @mattwoodrow that wall-time is also important. You may be within your cpuTime budget, but still run really darn slow due to an overloaded or slow system, etc. Perhaps we should be surfacing both?
When calculating the wallTime for a frame, the user agent SHOULD include the actual time spent processing the frame, which includes the
cpuTime
to process the frame and time for all other operations during that time.
WDYT?
(based on a video chat w/ Michael)...
duration
attribute (see #3), which we can update to capture the total time to process the frame, instead of defining it as time between startTime's of subsequent frames. This allows developers to (a) detect if they're exceeding the frame budget, and if so (b) by how much.. instead of just getting a 'previous frame was painted 33ms ago'.This SGTM.
On 3 December 2014 at 06:03, Ilya Grigorik notifications@github.com wrote:
(based on a video chat w/ Michael)...
- cpuTime looks good.
- wallTime is better handled via duration attribute (see #3 https://github.com/w3c/frame-timing/issues/3), which we can update to capture the total time to process the frame, instead of defining it as time between startTime's of subsequent frames. This allows developers to (a) detect if they're exceeding the frame budget, and if so (b) by how much.. instead of just getting a 'previous frame was painted 33ms ago'.
— Reply to this email directly or view it on GitHub https://github.com/w3c/frame-timing/issues/1#issuecomment-65284798.
Sorry I'm a bit late to the party.
I think for cpuTime, we should use wall clock, which would include descheduled (in the OS sense) time, but exclude idle (in the chrome sense) time. The reason for this is that thread time also varies between different OSes (I don't think there's a standard way of getting that info). The other reason is that we'll end up reporting work done (even if we pack tasks without idle time). I think that information is very useful, regardless of whether the developer has control over it (at the very least, it acts as a signal for further investigation).
Furthermore, I believe we should be reporting this work done summed over all threads that do the work, which is something that I don't think we explicitly addressed yet (or I missed it).
Maybe this means that we should call this something other than cpuTime, since that has comp sci connotation of measuring only scheduled time (in OS sense).
As an aside, the difference with duration is that in duration we should also include idle time, which would end up measuring end to end how long the frame took.
As I commented in https://github.com/w3c/frame-timing/pull/22 , based on @igrigorik commented 13 days ago I took the following to be our agreement;
I believe strongly we need a cpuTime in wall time is going to be too noisy to produce good data for the users. The primary reasons for this is the time you getting descheduled by the OS is of a similar order of magnitude to the time taken to render a frame (both are in milliseconds).
@natduca was the person who impressed on my the importance of cpuTime being actual processing used and not wall time taken. I understand that in his (very extensive) experience with telemetry and tracing that a wall time value has been demonstrably too noisy to be useful. I also understand this was the case even in highly controlled environments. @natduca - can you confirm that I'm not misinterpreting what you were saying?
In my own experience with measuring application performance on systems at Google before joining Chrome (not graphics rendering) I have also seen wall time based measurements containing huge amounts of noise.
I think looking at what I believe to be the two use cases for cpuTime;
a) A web developer understanding how "expensive" their website is (on their own machine, while testing). b) Collection of how your website performs on your user's machines.
Looking at case (a) I think;
Looking at case (b) the cpuTime value is going to be overwhelmed with noise, the same website, running on the same users machine is going to take widely different values depending on what else the user is doing. This means only the very largest websites which will have huge data sets to average over will have any hope of getting useful data here.
(Just finishing linking the two issues together.)
I believe the discussion around duration
here seems conflicts with the discussion on duration
in #3
I spoke with @natduca, and he was very much in support of getting "cpu time" information into the performance timeline in various ways. I'll let him comment, but I believe he wanted to do it in a more holistic sense, not just in Frame Timing. I think his intention is that once everyone agrees on what standard cpu timing information to carry over into the platform, it would get added in multiple places and APIs. I don't think he wanted to make Frame Timing block on that discussion though.
The thought behind using a wall-clock version of cpuTime for Frame Timing is that if you wanted actual detailed measurements of each step of your pipeline, you would use Chrome's about:tracing or IE's Developer Tools or something similar. Frame Timing should be capturing what the real-world users are experiencing.
That said, if wall clock is just too noisy to be useful, maybe it's not helpful?
I'm trying to imagine how I would use this as a web developer. I would think I really just want to know A) Am I hitting my frame deadlines for a smooth page, or am I missing and janking? B) How much room do I have for extra work (or, if I'm janking, how much am I missing deadlines by)?
If you only have 5ms of work, but because of system load it takes 10ms to run (for example) then you don't have nearly as much spare capacity as a straight CPU cycles measurement would suggest and the 10ms number is more relevant and interesting, in my opinion.
Feel free to disagree though! @mithro - do you have a different use case in mind?
Thanks, -Mike
Let's continue the discussion in #22; closing.
I agree we need "cpu timing" but it is probably something we should add to the entire performance timeline ecosystem, instead of something just for frame timing spec. Eg we can defer cpu timing for a separate spec. For now lets focus on precise definitions of the main frame events, precise definitions of duration versus cost, but actually report cost in terms of wall-time-cost.
Current proposal defines
cpuTime
onPerformanceMainFrameTiming
but omits it onPerformanceCompositeTiming
, with thinking that cpuTime is the total across all threads.Instead, should we expose
cpuTime
on both events? That would make the interface more consistent and provides a more granular view into where the cpuTime is spent. New definition would be something like: