garris / BackstopJS

Catch CSS curve balls.
http://backstopjs.org
MIT License
6.72k stars 602 forks source link

BackstopJS hangs in Docker container #603

Closed questionmarc closed 6 years ago

questionmarc commented 6 years ago

I seem to keep having this problem where the chrome headless hangs forever until the script ends. Here's the output I'm seeing:

Starting Chromy: {"chromeFlags":["--disable-gpu","--force-device-scale-factor=1","--disable-infobars=true","--no-sandbox","--window-size=320,480"],"port":9222,"waitTimeout":30000,"visible":false}
Starting Chromy: {"chromeFlags":["--disable-gpu","--force-device-scale-factor=1","--disable-infobars=true","--no-sandbox","--window-size=1024,768"],"port":9223,"waitTimeout":30000,"visible":false}
Starting Chromy: {"chromeFlags":["--disable-gpu","--force-device-scale-factor=1","--disable-infobars=true","--no-sandbox","--window-size=1600,900"],"port":9224,"waitTimeout":30000,"visible":false}
9223 Chrome v61 detected.
9223 ***WARNING! CHROME VERSION 62 OR GREATER IS REQUIRED. PLEASE UPDATE YOUR CHROME APP!***
9222 Chrome v61 detected.
9222 ***WARNING! CHROME VERSION 62 OR GREATER IS REQUIRED. PLEASE UPDATE YOUR CHROME APP!***
9224 Chrome v61 detected.
9224 ***WARNING! CHROME VERSION 62 OR GREATER IS REQUIRED. PLEASE UPDATE YOUR CHROME APP!***
bash-4.3#

This is utilizing the same dockerfile as the one listed in the repo. I'm not sure why it's hanging but I don't get any real output. Changing the engine to PhantomJS works, and from looking at the debug output from the debug flag, it is reaching the webpage. I tried adding some flags to the docker run from reading this project: https://github.com/yukinying/chrome-headless-browser-docker . I also tried to add the --no-sandbox option to the chromeFlags but to no avail.

So I'm not sure if this is a chrome-headless issue? An issue with chromium v61? Alpine? I'm also not sure how to debug this further.

If you have any insight as to things I could be doing to make progress, please let me know.

questionmarc commented 6 years ago

New Development: with debug on and setting the asyncCaptureLimit to 1, I got one of the scenarios to complete. But then the other two time out.

questionmarc commented 6 years ago

output for that run was something like

Starting Chromy: {"chromeFlags":["--disable-gpu","--force-device-scale-factor=1","--disable-infobars=true","--no-sandbox","--window-size=320,480"],"port":9222,"waitTimeout":30000,"visible":false}
9222 Chrome v61 detected.
9222 ***WARNING! CHROME VERSION 62 OR GREATER IS REQUIRED. PLEASE UPDATE YOUR CHROME APP!***
9222 INFO >  BackstopTools have been installed.
Starting Chromy: {"chromeFlags":["--disable-gpu","--force-device-scale-factor=1","--disable-infobars=true","--no-sandbox","--window-size=1024,768"],"port":9223,"waitTimeout":30000,"visible":false}
9223 Chrome v61 detected.
9223 ***WARNING! CHROME VERSION 62 OR GREATER IS REQUIRED. PLEASE UPDATE YOUR CHROME APP!***
Starting Chromy: {"chromeFlags":["--disable-gpu","--force-device-scale-factor=1","--disable-infobars=true","--no-sandbox","--window-size=1600,900"],"port":9224,"waitTimeout":30000,"visible":false}
9224 Chrome v61 detected.
9224 ***WARNING! CHROME VERSION 62 OR GREATER IS REQUIRED. PLEASE UPDATE YOUR CHROME APP!***

So it looks like the BackstopTools finished on the first one, but then timed out on the other two?

garris commented 6 years ago

What does the report look like?

questionmarc commented 6 years ago
      COMMAND | Executing core for `report`
      compare | Chromy error: GotoTimeoutError. See scenario ā€“ scenario 1 (tablet)
      compare | Chromy error: GotoTimeoutError. See scenario ā€“ scenario 1 (desktop)
   See: /usr/src/app/backstop_data/bitmaps_test/20171121-215652/failed_diff_test_scenario_1_0_document_0_phone.png
      compare | ERROR { requireSameDimensions: false, size: isDifferent, content: undefined%, threshold: 0.1% }: scenario 1 test_scenario_1_0_document_0_phone.png
       report | Test completed...
       report | 0 Passed
       report | 3 Failed
       report | Writing jUnit Report
       report | Writing browser report
       report | jUnit report written to: backstop_data/ci_report/xunit.xml
       report | Browser reported copied
       report | Copied configuration to: /usr/src/app/backstop_data/html_report/config.js
       report | *** Mismatch errors found ***
      COMMAND | Command `report` ended with an error after [0.731s]
      COMMAND | Command `test` ended with an error after [66.935s]

One of the failures was a correct failure (the ui did infact change so I was expecting that) but the other two compares end in GotoTimeoutError.

The html report just has blank spaces where the tests failed.

kiran-redhat commented 6 years ago

Change GoToTimeout to 360000 and see if still coming?

@garris, There seems to be some issue with gototimeout which is fixed in chromy .5.6 can we upgraded Backstop chromy dependency? I have done that in my fork and all seems to work fine

questionmarc commented 6 years ago

I think the issue is because I'm trying to do too much at once. I have a 8gb backstop_data folder with the complete set of reference images I want to use for my.. 3000 some odd test cases. šŸ˜… I tried it out locally with just the sample init file using a fork. Sure enough it didn't implode like my previous test case had. šŸ˜ƒ

Now the real question is how I can reduce what I'm doing or break up the backstop_data into different containers.. šŸ¤”

questionmarc commented 6 years ago

Hm.. I think I'm wrong again about that.. seems it's just my scenarios that are failing somehow. I don't get this failure when I run locally, just in the docker container. So strange...

questionmarc commented 6 years ago

Looks like there's an internal timer that after a certain number of seconds pass the application quits? Is this different from waitTimeout ? Can that be edited with an option?

questionmarc commented 6 years ago

So.. After some debug statementing.. I can confirm that it is reaching the process.on('exit') event when it exits the application. Even though the headless chrome hasn't returned from my scenario. šŸ¤”

Edit: Maybe there's something wrong with my scenario? I'll add it below.

{
  "label": "en-us WoW Homepage Full Page check",
  "url": "https://worldofwarcraft.com/en-us/",
  "hideSelectors": [
    "body > div.body > div > div.page-Home > div.page-Home-masthead > div > div.VideoPane-bg > video",
    "body > div.body > div > div.page-Home > div.page-Home-masthead > div > div.VideoPane-content > div:nth-child(7)"
  ]
}
questionmarc commented 6 years ago

It seems that setting the timer on Chromy has no effect. No matter what value I set it to the scenario always ends at ~30s.

Edit: Didn't realize gotoTimeout and waitTimeout were 2 different timeout settings in chromy. šŸ˜ž

questionmarc commented 6 years ago

Even with a gotoTimeout of 5min+ the process silently fails and eventually receives the exit signal with code 0.

Added to cli/index.js:50:

  process.on('exit', function (code) {
    console.log(`Exit signal recieved: ${code}`);
    process.exit(exitCode);
  });

And then in my docker container's bash:

Starting Chromy: {"chromeFlags":["--disable-gpu","--force-device-scale-factor=1","--disable-infobars=true","--window-size=1024,768"],"port":9226,"waitTimeout":120000,"visible":false,"gotoTimeout":300000}
Cookie state restored with: []
SCENARIO > en-us WoW Homepage Full Page check
9226 Chrome v61 detected.
9226 ***WARNING! CHROME VERSION 62 OR GREATER IS REQUIRED. PLEASE UPDATE YOUR CHROME APP!***
Starting Chromy: {"chromeFlags":["--disable-gpu","--force-device-scale-factor=1","--disable-infobars=true","--window-size=1600,900"],"port":9227,"waitTimeout":120000,"visible":false,"gotoTimeout":300000}
Cookie state restored with: []
SCENARIO > en-us WoW Homepage Full Page check
9227 Chrome v61 detected.
9227 ***WARNING! CHROME VERSION 62 OR GREATER IS REQUIRED. PLEASE UPDATE YOUR CHROME APP!***
Exit signal recieved: 0

Note that the process doesn't even enter the report portion.

Edit: Kept tracking all the way down into Chromy itself. I'm not sure why but the process just hangs on the this.client.Page.navigate({url: url}) call. That's controlled by the chrome-remote-interface which just fails after the timeout and causes chromy to throw GotoTimeoutError. But it still ends the process, which is annoying. :/

questionmarc commented 6 years ago

Ok.. I solved the issue. It was a docker issue all along. šŸ˜“ Linking the reference material that got me to solving the problem (which was ironically because I was doing research into alternatives to chromy and found puppeteer): https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md#tips

essentially adding --shm-size=2gb and --cap-add=SYS_ADMIN to my docker run command let chrome have enough shared memory to finish running the test.

What was probably happening was that the chrome process was running out of shared memory and couldn't finish loading the page (huge page, lots of assets, etc...) So then when that occurred, it was being treated as a gotoTimeout because it couldn't load the page with the shared space provided by docker by default (which is 64mb by default).

After increasing the shared memory size to 2gbs the process completed successfully. šŸ˜„

garris commented 6 years ago

@questionmarc Great -- thank you for posting a fix!

see also... https://github.com/garris/BackstopJS/issues/537#issuecomment-339710797

garris commented 6 years ago

I just added a link to this issue in the Docs.

kiran-redhat commented 6 years ago

@questionmarc @kevineye @garris I am still randomly getting GotoTimeoutError: goto() timeout when I used asyncCaptureLimit 10 (all my tests have lots of content similar to www.amazon.com) but I don't see issue when only selectors or pages with small content is tested. Is this same with you guys?

kevineye commented 6 years ago

@kiran-redhat When I saw this issue, either larger pages or large screencap sizes seemed to require more shared memory -- I don't remember which or if I ever tried to figure out the difference. Obviously more running at once (higher asyncCaptureLimit) also made it happen more often, but I remember I would have the problems with some pages/captures even with asyncCaptureLimite = 1 until I raised the docker shared memory available.

I did not need 2gb for my tests to run at asyncCaptureLimit=10, just 512mb, but if your pages/captures are large enough maybe you need even more, or maybe there's another problem.

I don't believe the actual error I got involved "GotoTimeoutError". I think it was WaitTimeoutError: evaluate(). I could be wrong -- I no longer have the error in front of me, but I remember it matching #537 pretty closely.

When debugging, I was able to run df and see the usage of /dev/shm going up and run ps to see chrome processes running and then getting killed (all in the backstop container while backstop was running). I also turned on enough logging (some by tweaking the src of installed js libs -- again, in the container) to actually see chrome being killed with a bus error trying to address invalid or out of bounds memory addresses.

Good luck!

kiran-redhat commented 6 years ago

Hey @kevineye thanks for you analysis. I also observed that it is not working even with gotToTimeOut of 7 mins, as @questionmarc mentioned above in 'Edit' section, it is just silently failing by throwing GotoTimeoutError: goto() timeout when it can't open given url.

AndreKoepke commented 1 year ago

I must also activate "Rosetta" on my Mac M1.
image