Overwritten user timing marks get wrongly exported

laurentgoudet commented 9 years ago

I'm using @stevesouders Hero Images Custom Metrics technique (http://www.stevesouders.com/blog/2015/05/12/hero-image-custom-metrics), with the difference that I do a clearMarks/mark in the load event handler to have a single timing mark for the metric I'm interested in. That works great in the DevTool (try performance.getEntriesByName('profile_image')[0].startTime or performance.getEntriesByName('profile_image')[0].startTime on https://www.freelancer.com/u/best1.html).

However, while looking into why the SpeedCurve graphs were fluctuating that much, I found out that WPT was not reporting these marks correctly and they were not reflecting the reality. When looking at http://www.webpagetest.org/video/compare.php?tests=151203_VH_fab308d054f763310634beef3390b8b9-r:1-c:1, it seems that WPT captures the first call to mark for a specific mark name and doesn't take in account the subsequent clearMark + mark (in the load handler if the image wasn't yet loaded).

Glancing at the code the USER_TIMING_SCRIPT script defines in agent/browser/chrome/extension/release/wpt/background.js seems perfectly fine, however if my (non existent) c++ skills are right that collectstats function is called every 100ms, and the HAR export confirms this (multiple instances of the same time mark with different values).

TL;DR: can the user timing marks behave like in the DevTools, i.e. only export the last values?

pmeenan commented 9 years ago

collectStats is only run at the end of a test. The 100ms interval is a polling loop for getting tasks from the c++ code up to the extension but there is only one collectStats task right at the end of the test.

Do you have code attached to multiple profile_image's or just the one? Looking at the script after the image it looks like it sets the mark as soon as the script is run (before the image loads) and then again in the onload handler for the profile image.

In the repeat view the profile image is coming out of disk cache so I'm not surprised to see it fire before anything is rendered on the screen (the onload for the image element doesn't necessarily correlate with when it will be painted and even the script block execution doesn't guarantee that the image before it was actually painted).

In the first view run it looks like the profile image mark lines up with the end of the network load which is pretty much where you'd expect to see it.

laurentgoudet commented 9 years ago

Do you have code attached to multiple profile_image's or just the one?

Just the one

Looking at the script after the image it looks like it sets the mark as soon as the script is run (before the image loads) and then again in the onload handler for the profile image.

As mentioned by Steve "Therefore, to create a custom metric that determines when an image is displayed you should take the max of the values reported by the inline script timer and image onload.", it seems to be the most accurate way to know when an image is displayed (http://stevesouders.com/tests/image-custom-metric.php), and I've confirmed that with multiple runs in the DevTools with the filmstrip view (with a primed cache or not).

However, I do agree that looking at the WPT results that technique doesn't seem to work :(. The HAR export of http://www.webpagetest.org/video/compare.php?tests=151203_VH_fab308d054f763310634beef3390b8b9-r:1-c:0 still contains 2 different sets of values but none of those seem to correlate to the actual user experience:

Real (from the filmstrip): 3.2s, 4.6s Reported (in the waterfall): 2.8, 4.2s

First set:

"_date":1448888456,
"_userTime.profile_image":2839,
"_userTime.cover_image":5581,

Second set:

"_date":1448888495,
"_userTime.profile_image":2293,
"_userTime.cover_image":4400,

pmeenan commented 9 years ago

I expect the 2 sets in the HAR are for the first and repeat view runs and I'm not surprised they don't match up. Steve's technique is the best that I'm aware of but it's still not guaranteed and the browsers don't expose any way for the pages to know when things actually paint (it's been a LONG standing ask).

laurentgoudet commented 9 years ago

Thanks, that does completely make sense. With all the hype about RAIL I'm a bit disappointed that we still don't have a proper way to know when something very basic (e.g., text) is painted on the screen. Sure we now have the Navigation Timing & Resource Timing APIs but none of these give any meaningful data when is comes to what the user see. Speed Index is better (thanks for that!) but not available in RUM and doesn't play nice with the carousels product people love.

Ilya mentioned event.timeStamp ("R"), Frame Timing ("A") and requestIdleCallback ("I") during his Chrome Dev summit talk, but looking at the W3C Performance group roadmap (http://www.w3.org/wiki/Web_Performance/Publications) I don't see anything which could provide an actual answer to the "L" of RAIL, i.e. "Provide a meaningful user experience under 1 second", so I don't understand how people implement RAIL if they can't measure real life :(

pmeenan commented 9 years ago

FWIW, if you test with Chrome canary you can get some newer experimental measurements we are working on (but none of them are exposed to the DOM): http://www.webpagetest.org/xmlResult/151203_M5_XJS/

There are measurements for the first image and text paints. We're still working on validating them to try to get them to be as accurate as possible (and working on Speed Index inside of Chrome). With any luck, if we can get it to work reliably inside of Chrome then you'll start to see discussions about exposing it to the DOM.

pmeenan commented 9 years ago

I should have mentioned what to look for. Look at the chromeUserTiming.* values in the XML data.

laurentgoudet commented 9 years ago

FYI, I've added new marks out of curiosity (http://www.webpagetest.org/result/151204_Q8_923da835a329d33559ccbd24d90dc89d/) and:

In the initial view, the mark is set a first time in the inline script & overwritten in the handler. WPT correctly reports the later one (off from reality by 300ms).

profile_image	profile_image_inline	profile_image_onload	actual render time
6.319s	4.608s	6.319s	6.7s

In the repeat view, the onload handler has already fired as the image is served from the disk and WPT correctly reports the inline value (off from reality by 600ms).

profile_image	profile_image_inline	actual render time
2.063s	2.063s	2.6s

As there are nothing wrong with WPT, I'm closing that issue.

FWIW, if you test with Chrome canary you can get some newer experimental measurements [...] for the first image and text paints.

I'm not sure how helpful these metrics would be as on most sites the navigation (image/logo + text) will render first, with no or few correlation with what the users are waiting for (the actual content). And startRender will/would probably report the same value.

[...] working on Speed Index inside of Chrome

This seems to be the much needed metric that the Web platform is waiting for. However:

Decision makers don't like it because it's an index, not a number of second, and would prefer to use a less meaningful metric (e.g., window.onload) as it feels "more real" ("let's get under 5 seconds" vs. "how good in 3000?"). How about reporting the time to a percentage of visual completeness (e.g., 25%, 50%, 75%, 95%, 100%)? For instance, looking at the visual progress chart of the previous test:

25% visually complete	50% visually complete	75% visually complete	95% visually complete	100% visually complete
2.5s	3.1s	3.1s	3.1s	5.5s

Coupled with startRender, if I had access to that data in RUM I would know precisely when each of the main elements of the viewport is painted (navigation -> content -> cover image), and I could report an actual "number of seconds" to the business side. Considering that 90%/95% is probably the threshold at which most of the pages are "fully loaded", ads & social widgets aside, that could become the "one size fits all metric" (or 100% maybe, after all as a user if the page still display things it's not really loaded - ad blindness apart).

Carousels break it, and designers love carousels (http://www.webpagetest.org/video/compare.php?tests=151204_2V_0809d5ec7916ad29b378d5669900e3ea-r:1-c:0). A magic url param to disable them in synthetics monitoring won't work in real life.
It's probably tricky to implement consistently in a cross-platform way, and it's probably tricky to implement at all without massively slowing down the rendering pipeline, but nothing can't be solved by software :)

Another option would be to have a PerformanceObserver, not taking performance entries as input like http://www.w3.org/TR/performance-timeline-2 but DOM nodes like MutationObserver, and reporting the paintTime of each observed node. I see that as complementary to a broader performance like Speed Index, but it would still be widely useful.

pmeenan commented 9 years ago

Yep, a performance observer is the ultimate goal but there has been a lot of push-back because it opens up a lot of potential privacy/sniffing attacks :-(. Hopefully we get to the point where they don't offer any new vectors that can't also be attacked another way and we can make progress on it. I expect that's on the order of years though.

On Fri, Dec 4, 2015 at 10:52 AM, Laurent Goudet notifications@github.com wrote:

FYI, I've added new marks out of curiosity ( http://www.webpagetest.org/result/151204_Q8_923da835a329d33559ccbd24d90dc89d/) and:

In the initial view, the mark is set a first time in the inline script & overwritten in the handler. WPT correctly reports the later one (off from reality by 300ms).

profile_image profile_image_inline profile_image_onload actual render time 6.319s 4.608s 6.319s 6.7s

In the repeat view, the onload handler has already fired as the image is served from the disk and WPT correctly reports the inline value (off from reality by 600ms).

profile_image profile_image_inline actual render time 2.063s 2.063s 2.6s

As there are nothing wrong with WPT, I'm closing that issue.

FWIW, if you test with Chrome canary you can get some newer experimental measurements [...] for the first image and text paints.

I'm not sure how helpful these metrics would be as on most sites the navigation (image/logo + text) will render first, with no or few correlation with what the users are waiting for (the actual content). And startRender will/would probably report the same value.

[...] working on Speed Index inside of Chrome

This seems to be the much needed metric that the Web platform is waiting for. However:

Decision makers don't like it because it's an index, not a number of second, and would prefer to use a less meaningful metric (e.g., window.onload) as it feels "more real" ("let's get under 5 seconds" vs. "how good in 3000?"). How about reporting the time to a percentage of visual completeness (e.g., 25%, 50%, 75%, 95%, 100%)? For instance, looking at the visual progress chart of the previous test http://www.webpagetest.org/video/compare.php?tests=151204_Q8_923da835a329d33559ccbd24d90dc89d-r:1-c:1 :

[image: screen shot 2015-12-04 at 4 20 20 pm] https://cloud.githubusercontent.com/assets/2169585/11593153/fb39457a-9aa2-11e5-935e-8f3b935eea1d.png 25% visually complete 50% visually complete 75% visually complete 95% visually complete 100% visually complete 2.5s 3.1s 3.1s 3.1s 5.5s

Coupled with startRender, if I had access to that data in RUM I would know precisely when each of the main elements of the viewport is painted ( navigation -> content -> cover image), and I could report an actual "number of seconds" to the business side. Considering that 90%/95% is probably the threshold at which most of the pages are "fully loaded", ads & social widgets aside, that could become the "one size fits all metric" (or 100% maybe, after all as a user if the page still display things it's not really loaded - ad blindness apart).

-

Carousels break it, and designers love carousels ( http://www.webpagetest.org/video/compare.php?tests=151204_2V_0809d5ec7916ad29b378d5669900e3ea-r:1-c:0). A magic url param to disable them in synthetics monitoring won't work in real life.

It's probably tricky to implement consistently in a cross-platform way, and it's probably tricky to implement at all without massively slowing down the rendering pipeline, but nothing can't be solved by software :)

Another option would be to have a PerformanceObserver, not taking performance entries as input like http://www.w3.org/TR/performance-timeline-2 but DOM nodes like MutationObserver, and reporting the paintTime of each observed node. I see that as complementary to a broader performance like Speed Index, but it would still be widely useful.

— Reply to this email directly or view it on GitHub https://github.com/WPO-Foundation/webpagetest/issues/539#issuecomment-162002932 .

catchpoint / WebPageTest

Overwritten user timing marks get wrongly exported #539

Carousels break it, and designers love carousels ( http://www.webpagetest.org/video/compare.php?tests=151204_2V_0809d5ec7916ad29b378d5669900e3ea-r:1-c:0). A magic url param to disable them in synthetics monitoring won't work in real life.