bslatkin / dpxdt

Make continuous deployment safe by comparing before and after webpage screenshots for each release. Depicted shows when any visual, perceptual differences are found. This is the ultimate, automated end-to-end test.
https://dpxdt-test.appspot.com
Apache License 2.0
1.44k stars 124 forks source link

"Capture and Crawl" fail on CaptureFailedError: Sent SIGKILL to item #168

Closed rubig closed 8 years ago

rubig commented 8 years ago

To reproduce:

  1. Set up Depicted Server on Ubuntu as per instruction on setting up Local Depicted server from Readme on https://github.com/bslatkin/dpxdt
  2. Run in console (depicted)ubuntu@ip-10-224-91-168:~/depicted$ ./run_site_diff.sh --upload_build_id=1 --crawl_depth=1 http://www.carsales.com.au

Consold Output: Scanning for content Scanning 1 pages for good urls Found 0 new URLs from http://www.carsales.com.au/ Finished crawl at depth 0 Found 1 total URLs, 1 good HTML pages; starting screenshots Requesting run for http://www.carsales.com.au/ Marking runs as complete

Issue 1: Crawl fails to grab any urls on www.carsales.com.au. This may have to do with regular expression in matching URLs. I can go in and have a look, but I think the author knows more about all implications on this regex and is the best one to look at changing regex

Issue 2: Screenshot fails. When I go to Depicted server on :5000, I saw Failed after max attempts CaptureFailedError: Sent SIGKILL to item=dpxdt.client.capture_worker.CaptureWorkflow({parent: dpxdt.client.capture_worker.DoCaptureQueueWorkflow#139980265969552, config_path: '/tmp/tmp8CiiLV/config.json', args: ('/tmp/tmp8CiiLV/log.txt',), interrupted: False, output_path: '/tmp/tmp8CiiLV/capture.png', kwargs: {timeout_seconds: 20}})#139980268763856, pid=8203, run_time=20.2134301662

Full log and config in attachment.

rubig commented 8 years ago

config.json.txt full.log.txt

bslatkin commented 8 years ago

Sorry for the delay.

In the log you can see:

Still waiting for: https://googleads.g.doubleclick.net/pagead/viewthroughconversion/1017754493/?random=30626000&cv=8&fst=1457502374558&num=1&fmt=3&adtest=on&value=0&label=f015CJuTmgIQ_eam5QM&bg=666666&hl=en&guid=ON&eid=317150504&u_h=768&u_w=1024&u_ah=768&u_aw=1024&u_cd=32&u_his=1&u_tz=0&u_java=false&u_nplug=0&u_nmime=0&frm=0&url=http%3A//www.seek.com.au/&tiba=SEEK%20-%20Australia's%20no.%201%20jobs%2C%20employment%2C%20career%20and%20recruitment%20site&ctc_id=CAIVAgAAAB0CAAAA&ct_cookie_present=false&convclickts=0&ocp_id=o7jfVqOpG4m38gXvjK7YBg
Still waiting for: https://googleads.g.doubleclick.net/pagead/viewthroughconversion/1017754493/?random=1128608494&cv=8&fst=1457502374558&num=1&fmt=3&value=0&label=f015CJuTmgIQ_eam5QM&bg=666666&hl=en&guid=ON&eid=317150504&u_h=768&u_w=1024&u_ah=768&u_aw=1024&u_cd=32&u_his=1&u_tz=0&u_java=false&u_nplug=0&u_nmime=0&frm=0&url=http%3A//www.seek.com.au/&tiba=SEEK%20-%20Australia's%20no.%201%20jobs%2C%20employment%2C%20career%20and%20recruitment%20site&ctc_id=CAIVAgAAAB0CAAAA&ct_cookie_present=false&convclickts=0&ocp_id=o7jfVtLXG9Ou8AX_kqGQBg
Still waiting for: https://s.tribalfusion.com/i.cid?c=628673&d=30&page=landingPage
Still waiting for: https://googleads.g.doubleclick.net/pagead/viewthroughconversion/996179779/?random=1457502374811&cv=8&fst=1457502374811&num=1&fmt=3&label=DOw6CLX60QkQw_6B2wM&guid=ON&u_h=768&u_w=1024&u_ah=768&u_aw=1024&u_cd=32&u_his=1&u_tz=0&u_java=false&u_nplug=0&u_nmime=0&frm=0&url=http%3A//www.seek.com.au/&tiba=SEEK%20-%20Australia's%20no.%201%20jobs%2C%20employment%2C%20career%20and%20recruitment%20site&async=1
Still waiting for: https://www.google.com/ads/user-lists/1017754493/?label=f015CJuTmgIQ_eam5QM&fmt=3&bg=666666&num=1&ct_cookie_present=false&cv=8&frm=0&url=http%3A//www.seek.com.au/&eid=317150504&random=320667780
Still waiting for: https://www.google.com/ads/user-lists/1017754493/?label=f015CJuTmgIQ_eam5QM&fmt=3&bg=666666&num=1&ct_cookie_present=false&cv=8&frm=0&url=http%3A//www.seek.com.au/&eid=317150504&random=449884493
Still waiting for: https://www.google.com/ads/user-lists/996179779/?label=DOw6CLX60QkQw_6B2wM&fmt=3&num=1&cv=8&frm=0&url=http%3A//www.seek.com.au/&random=2116758165
Still waiting for: https://cm.g.doubleclick.net/pixel?google_nid=exp&google_cm&google_sc&google_ula=2786954&google_hm=18072662552493448202

That means the browser is just taking forever to load those Ad URLs. What I'd do is add those URLs to your "resourcesToIgnore" list in the config.json. That will cause those URLs to be blocked when PhantomJS runs, and allow the page to finish loading.

Please reopen if my suggestion doesn't work!