Closed csliva closed 2 years ago
Hey @csliva could you run some test runs with this new custom metric on WebPageTest for a sample of common web pages (ideally one with a broken HEAD too that this detects) and include the links to the results as a comment in the PR?
More details here: https://github.com/HTTPArchive/custom-metrics#testing
Good call @tunetheweb, tests failed. WPT moves invalid elements back into the body so I think I'll have to use $WPT_BODIES.
@csliva It's the browser that's doing it rather than WPT - when it builds the DOM it's going to truncate the head at the first invalid element
$WPT_BODIES should get you there but is probably a bit more work
URL with broken head: https://crawler-test.com/other/non_head_tag_in_head
WPT test URL: https://www.webpagetest.org/details.php?test=220518_AiDcAF_DFS&run=1&cached=0
Output: {"invalidElements":["div"],"invalidHead":true}
URL with correct head: https://developer.mozilla.org/en-US/
WPT test URL: https://www.webpagetest.org/details.php?test=220518_AiDcP0_DQW&run=1&cached=0
Output: {"invalidElements":[],"invalidHead":false}
@csliva ping, a couple of comments still outstanding
This looks really great! One note, we may want to force lower-case on tag name matching.
Additional tests with a test site I built to make sure XML parsing worked. URL with broken head: https://crawlgo.fly.dev/badhead WPT test URL: https://www.webpagetest.org/details.php?test=220525_BiDcTF_FW7&run=1&cached=0 Output: {"invalidElements":["div", "p"],"invalidHead":true}
According to these webmaster guidlines a head will be terminated by invalid HTML elements within the head node. This is worth tracking because Googlebot will terminate the head tag early and causing potential SEO issues.
This additional custom metric will return a boolean if any invalid elements are found.
Tests URL with broken head: https://crawler-test.com/other/non_head_tag_in_head WPT test URL: https://www.webpagetest.org/details.php?test=220518_AiDcAF_DFS&run=1&cached=0 Output: {"invalidElements":["div"],"invalidHead":true}
URL with correct head: https://developer.mozilla.org/en-US/ WPT test URL: https://www.webpagetest.org/details.php?test=220518_AiDcP0_DQW&run=1&cached=0 Output: {"invalidElements":[],"invalidHead":false}
URL with broken head: https://crawlgo.fly.dev/badhead WPT test URL: https://www.webpagetest.org/details.php?test=220525_BiDcTF_FW7&run=1&cached=0 Output: {"invalidElements":["div", "p"],"invalidHead":true}