Performance custom metric `is_lcp_statically_discoverable` is incorrectly reporting `false`

HTTPArchive / custom-metrics

Custom metrics to use with WebPageTest agents

Apache License 2.0

19 stars 22 forks source link

Performance custom metric `is_lcp_statically_discoverable` is incorrectly reporting `false` #60

Open kevinfarrugia opened 1 year ago

kevinfarrugia commented 1 year ago

Test URL: https://webpagetest.httparchive.org/result/221228_GY_1/1/details/#waterfall_view_step1

The LCP image is https://marieefleurir.com/html/template/default/img/top/firstviw01.jpg and is included in the initial HTML document sent by the server. I understand this should be recorded as is_lcp_statically_discoverable as true.

The root cause seems to be because raw_html: {}, which in turn is caused because WPT_BODIES[0].response_body: null.

See test: https://webpagetest.httparchive.org/result/221228_9S_4/1/details/#waterfall_view_step1 which includes custom metric:

[01_test]
const response_bodies = $WPT_BODIES[0];

return ({
  response_bodies
});

Is this a known issue? Why is response_body body null?

pmeenan commented 1 year ago

WPT will only store/process the bodies if they can be decoded as utf-8 in Python.

Even though the response headers say it is utf-8, Python is throwing a decode error when trying to decode it:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 453869: invalid start byte

I can probably modify the agent to ignore decode errors explicitly for responses with text content types which will help fill in missing responses but they will be filtered in that case (better than not present I guess).

pmeenan commented 1 year ago

PR has been sent for the agent. The HA crawl uses a fork that I control though so the change has been rolled out for the HA agent and will be picked up in the January crawl and should be live on the HA test instance within an hour.

I'll verify the fix on the HA test instance in an hour or so to make sure it also fixed the custom metric as a result.

pmeenan commented 1 year ago

Verified that it is working for that page now: https://webpagetest.httparchive.org/result/221228_YV_A/

Coverage in the January crawl should be improved but I still won't guarantee 100% coverage on response bodies (though finding edge cases like this helps improve it).

We can keep this open until you verify that it is fixed in the actual crawl.

kevinfarrugia commented 1 year ago

Thank you. Looks great. 🙏

kevinfarrugia commented 1 year ago

I think we are able to measure the impact of the fix after the January crawl using

#standardSQL

CREATE TEMPORARY FUNCTION getRawHtml(payload STRING)
RETURNS INT64
LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var wpt_bodies = JSON.parse($._wpt_bodies);
  if (wpt_bodies) {
    return Object.keys(wpt_bodies.raw_html).length;
  }

  return 0;
} catch (e) {
  return 0;
}
''';

SELECT
  device,
  COUNTIF(raw_html = 0) AS no_raw_html,
  COUNT(0) AS total,
FROM (
  SELECT
    _TABLE_SUFFIX AS device,
    url AS page,
    getRawHtml(payload) AS raw_html,
  FROM
    `httparchive.pages.2022_10_01_*`
)
GROUP BY
  device

Currently:

+---------+-------------+----------+
| device  | no_raw_html |  total   |
+---------+-------------+----------+
| desktop |      417465 | 10346759 |
| mobile  |      594275 | 15583948 |
+---------+-------------+----------+

pmeenan commented 1 year ago

As an upper bound anyway. We may still not get to 100% but good to know it's ~4% of origins.