I started a new crawl using single-file. I fetched 10 pages successfully and then gave me this error.
Timed out after 60000 ms URL: https://xxx
Stack: ScriptTimeoutError: Timed out after 60000 ms
at Object.throwDecodedError (C:\Users\SKANGA\AppData\Roaming\npm\node_modules\single-file-cli\node_modules\selenium-webdriver\lib\error.js:522:15)
at parseHttpResponse (C:\Users\SKANGA\AppData\Roaming\npm\node_modules\single-file-cli\node_modules\selenium-webdriver\lib\http.js:549:13)
at Executor.execute (C:\Users\SKANGA\AppData\Roaming\npm\node_modules\single-file-cli\node_modules\selenium-webdriver\lib\http.js:475:28)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async thenableWebDriverProxy.execute (C:\Users\SKANGA\AppData\Roaming\npm\node_modules\single-file-cli\node_modules\selenium-webdriver\lib\webdriver.js:735:17)
at async getPageData (C:\Users\SKANGA\AppData\Roaming\npm\node_modules\single-file-cli\back-ends\webdriver-gecko.js:141:17)
at async Object.exports.getPageData (C:\Users\SKANGA\AppData\Roaming\npm\node_modules\single-file-cli\back-ends\webdriver-gecko.js:37:10)
at async capturePage (C:\Users\SKANGA\AppData\Roaming\npm\node_modules\single-file-cli\single-file-cli-api.js:253:20)
at async runNextTask (C:\Users\SKANGA\AppData\Roaming\npm\node_modules\single-file-cli\single-file-cli-api.js:174:20)
It it from the remote website? Perhaps I am crawling too fast? Is it possible to delay requests by some random time, etc? Also - if I restart the same crawl - can I get single-file to ignore the pages that it has already downloaded?
It's due to webdriver which is not really designed for that. I would recommend to use puppeteer instead (playwright is also supported but you have to install it via npm).
I started a new crawl using single-file. I fetched 10 pages successfully and then gave me this error.
It it from the remote website? Perhaps I am crawling too fast? Is it possible to delay requests by some random time, etc? Also - if I restart the same crawl - can I get single-file to ignore the pages that it has already downloaded?