GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.29k stars 9.36k forks source link

`lantern-idle-callback-short` is flaky #14271

Open adamraine opened 2 years ago

adamraine commented 2 years ago

Been seeing this a lot

https://github.com/GoogleChrome/lighthouse/runs/7755431040?check_suite_focus=true

adamraine commented 2 years ago

Also legacy nav: https://github.com/GoogleChrome/lighthouse/runs/7755430152?check_suite_focus=true

adamraine commented 2 years ago

The extra blocking time appears to come from an event like this

{
  "args": {},
  "cat": "disabled-by-default-devtools.timeline",
  "dur": 77866,
  "name": "EvaluateScript",
  "ph": "X",
  "pid": 3620,
  "tdur": 50575,
  "tid": 1,
  "ts": 450686353,
  "tts": 1161020
},

Doing some digging where this event is emitted in v8, it looks like it's from the poller script injected by Puppeteer via Runtime.evaluate. Unfortunately, there isn't anything to identify this script in the trace event so we can't easily ignore it.

Would it be ok to ignore any EvaluateScript events emitted on category disabled-by-default-devtools.timeline?

connorjclark commented 2 years ago

Puppeteer via Runtime.evaluate

I think this is just and CDP Runtime.evaluate fwiw, so applies to legacy runner / all our page function code.

Even tho legacy happens less often, it could be good to figure out what code we're running in the page context that emits these long tasks.

for FR, we can either find out how to disable the culprit or ignore those tasks in various places in LH (seems like it could be hard to get right)

adamraine commented 2 years ago

I was doing some testing with the latest version of Puppeteer (18.x.x) and it seems like the long EvaluateScript from the injected code isn't super long anymore. I guessed https://github.com/puppeteer/puppeteer/pull/8946 may have fixed it but not certain. See https://github.com/GoogleChrome/lighthouse/pull/14385 which was failing lantern-idle-callback-short most of the time but stopped failing once Puppeteer was bumped to 18.x.x.

We should still emit a warning if Runtime.evaluate calls block the main thread too much. I will continue to investigate the impact of Puppeteer 18.x.x.

adamraine commented 2 years ago

A bisect supports https://github.com/puppeteer/puppeteer/pull/8946 as the fix for this issue. Puppeteer util injection no longer returns by value on the protocol.

adamraine commented 2 years ago

So upgrading Puppeteer in #14385 did help for the reasons I mentioned above, but we are still seeing failures in the DevTools smoke runner sometimes.

Example https://github.com/GoogleChrome/lighthouse/actions/runs/3185898767/jobs/5196014984

adamraine commented 2 years ago

I did a little investigation, the new failures seem to be from the idle callbacks taking too long on the main thread:

Screen Shot 2022-10-05 at 4 40 44 PM

Unfortunately, I still can't get a local repro. Current theory is that the extra setup for DT smokes (loading the page initially, opening DT panel, etc.) uses too much processing power and slows down the idle callback tasks.

adamraine commented 1 year ago

Gonna increase DT retries

paulirish commented 1 year ago

lantern-idle-callback-short is testing ric-shim?short which.. is a fairly awkward situation that's arguably an edge case.

We can improve it to be a bit more true to rIC usage in the wild or remove it. As of now, it's understandable this is flaky.

brendankenny commented 1 year ago

according to https://chromestatus.com/metrics/feature/timeline/popularity/954, 30-40% of page loads include a requestIdleCallback call. If we continue to shim it (which currently seems like the only option if using simulated throttling), it does feel like we should have enough confidence in it that this test can reliably run, whether that requires a change in the shim or in establishing expected variance and adding that to the test expectations.

adamraine commented 1 year ago

I don't think the variance in the test case comes from our requestIdleCallback shim. When running the test locally it passes 100% of the time and fails 100% of the time if I remove our shim.

I believe the variance comes from CPU slowdown on CI bots that causes tasks to randomly take longer than normal. I think we need a different expectation than TBT < 100ms which can reliably verify that our shim is working properly, while allowing more variance in blocking time that is caused by CI bots being slow.

adamraine commented 1 year ago

Gonna try reproducing in PSI and/or on a page with lots of oopifs