MontFerret / ferret

Declarative web scraping
https://www.montferret.dev/
Apache License 2.0
5.75k stars 302 forks source link

operation timed out: WAIT_NAVIGATION #468

Open selmison opened 4 years ago

selmison commented 4 years ago

Describe the bug I am trying to get URLs of the all videos of the site jw.org with the script below, but Ferret throws the following exception:

Failed to execute the query operation timed out: WAIT_NAVIGATION(jw_videos) at 7:0

To Reproduce Steps to reproduce the behavior:

  1. Run docker run -d -p=0.0.0.0:9222:9222 --name=chrome-headless -v /tmp/chromedata/:/data alpeware/chrome-headless-stable

  2. Run ferret --cdp http://127.0.0.1:9222

  3. Run the following script:

    LET jw_videos = DOCUMENT("https://www.jw.org/pt/biblioteca/videos", {
    driver: "cdp",
    userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"
    })
    WAIT_ELEMENT(jw_videos, '.sectSynHdg', 20000)
    CLICK(jw_videos, 'div[class*=pageSectionContainer]:nth-of-type(3) div[class*=syn-body] > h3 > a')
    WAIT_NAVIGATION(jw_videos)
    WAIT_ELEMENT(jw_videos, '.sectSynHdg', 10000)
    FOR el IN ELEMENT(jw_videos, 'div[class*=pageSectionContainer]:nth-of-type(3) div[class*=syn-body] > h3 > a[href]')
    RETURN el

Expected behavior Get URLs of the all videos of the site.

Desktop (please complete the following information):

ziflex commented 4 years ago

By some reason the page crashes in headless mode and freezes in the normal one.