Closed nehakansal closed 6 years ago
max timeout is maximal allowed timeout value, but default timeout stays the same at 30 s I'm afraid, I don't see a timeout set when we make splash request: https://github.com/TeamHG-Memex/undercrawler/blob/master/undercrawler/spiders.py. I think raising timeout in undercrawler to 90 seconds makes sense (as it's default max timeout for recent splash versions), and also it might be useful to expose it via settings to allow making it larger.
Closing this issue here as it's related to undercrawler.
Okay, thanks for clarifying that. Adding it as a setting to Undercrawler would definitely be helpful. Do you plan to add it as an issue to Undercrawler? In the meantime, is adding timeout to the splash_args in the Undercrawler code the only fix if I want to run some tests with higher timeout value? Thank you.
@nehakansal I think that adding this to undercrawler code is the only way to fix this, and I'm not working on it at the moment, but I'll be happy to merge a pull request and help with implementation. Timeout value in splash API is documented here http://splash.readthedocs.io/en/stable/api.html#execute and this is the place to add it: https://github.com/TeamHG-Memex/undercrawler/blob/7d4f21520a9770eb94641420625b927d92537a29/undercrawler/spiders.py#L63-L69
Thank you. That's what I kind of figured. I added the timeout argument, in the code, locally for now.
Hi,
I checked that the docker-compose.yml sets the max-timeout for each splash to 3600, but I still keep getting the following error when I crawl using the Undercrawler
{"description":"Timeout exceeded rendering page", "error": 504, "type": "GlobalTimeoutError", "info":{"timeout": 30}}
From what I understand, this shouldnt happen because the timeout value to be used here should be 3600, is that correct?
Thanks, Neha.