Open tollmanz opened 6 years ago
Thanks a million for the very detailed bug report @tollmanz! My best guess is that Chrome is crashing when trying to decode this image either because of an OOM or some CentOS-specific weirdness.
Re #1: If you can reproduce the crash in Chrome just by loading the page while tracing (perhaps trying with puppeteer), a bug filed over at http://crbug.com/ would be awesome 👍 I'm afraid we're not the best resource for image loading pipeline crashes :)
Re #2: You're totally right we should try to more gracefully handle targetCrashed occurrences in Lighthouse! I'll let this bug track that effort.
Hi @patrickhulce! Thanks for the feedback. After spinning my wheels trying to get Puppeteer working on Debian with no luck, I realized I could easily reproduce this issue directly with Chrome Headless. I've logged a bug with Chrome. If you are interested, I have added a new branch to my repo for demonstrating the issue. TL;DR, loading the sample page via:
google-chrome-stable --headless --enable-logging --v=99 --no-sandbox --disable-gpu https://tollmanz.github.io/lighthouse-bug-repro/page/index.html
will crash Chrome with the following error:
[0514/020857.744852:ERROR:headless_shell.cc(348)] Abnormal renderer termination.
tracked upstream in https://bugs.chromium.org/p/chromium/issues/detail?id=842679
since this is related to headless chrome, perhaps the new --headless=chrome
option will resolve it. Could someone impacted try that?
Problem
When testing a page with a large image (~4 MB), an unrecoverable
Inspector.targetCrashed
error can occur on CentOS 7. I think this error crashes Chrome, but Lighthouse continues awaiting for gatherers and never ceases running. Ultimately, the process hangs, is undetectable, and the service needs to be restarted. I cannot reproduce the issue on OS X.Environment
Node: 8.11.1 Lighthouse: 2.9.4 CentOS: 7.4.1708
Reproduce steps
I've prepared a test environment and test page to reproduce this bug. It's very tough to reproduce otherwise.
Clone the repo
cd
into the clone directoryInstall Docker if you don't already have it
Build the CentOS box. It is very simple and installs CentOS 7, Google Chrome (with deps), Node/NPM, and Lighthouse
Attach to the image
Run Lighthouse against test page
A few notes about this config:
--disable-network-throttling
or not. I find that the bug is more reproducible with it. Without it, you sometimes get other network errors. Also, it's a lot faster :)--chrome-flags
used seem to be necessary to run Chrome via the Chrome Launcher on CentOS--verbose
flag will show theInspector.targetCrashed {}
error message when the bug is producedgoogle-chrome-stable
package is used as that seems to be the recommendation from others using CentOS and it seems to work well other than with this caseThis error does not occur every single time. In my testing, it occurs 3 out of 4 tries. You'll know you've hit the error when you see something like:
Once you hit this step, I encourage you to wait at least one minute. The Gatherers have a 1 minute timeout and sometimes additional output is generated; however, the run will not complete, an error will not be thrown, and the process will hang.
Context
I see this error a lot. It's tough to quantify due to not being able to catch and handle the error. In production, I do not run the verbose logs as it would create way to many logs. Without the verbose logging, the only indication I get is that the instance hangs and I have a process that looks to see if a test has been running for more than 3 minutes, which then triggers a restart. I'm running 20-30k tests/day and can see 100's of these process "hang" issues an hour, all depending on the current state of a page. I finally found one page that regularly caused the issue and could reduce it down to the test case I provided above.
Solution
There's two things I would like to work out within the context of this bug:
Full output of issue