Closed ivbeg closed 1 week ago
Hi @ivbeg, let me check and address this problem as soon as possible. Thanks a lot for raising this issue!
Hi @ivbeg, please confirm which Robotspy version you are using. Thank you 😊
@andreburgaud Hi! Version 0.10.0
Thank you @ivbeg! I'm on it. I will make sure to keep you posted.
Something similar happens with this robots.RobotsParser.from_uri("http://22-lr.forumactif.com/robots.txt”) but I can download the robots.txt file and if I use: robots.RobotsParser.from_string(robots_downloaded_file) it also hangs forever. Note: the file size is 20624 chars
@ivbeg Sorry for the time it fook me to release 0.11 https://pypi.org/project/robotspy/. This should address the timeout issue, although it may require more scrutiny of the logic, especially with higher-level functions like can_fetch
. The function from_uri
now takes a parameter timeout
set to 5 by default. Note that it is not a clock timeout per se and may take longer than you would intuitively expect. As you suggested in your first comment, you can pass a specific timeout value. For example, you could do:
robots.RobotsParser.from_uri("https://earthworks.stanford.edu/robots.txt", 2)
To test it, you can use the following example with a dummy port (timeout set to 1):
robots.RobotsParser.from_uri("https://robotspy.org:555/robots.txt", 1)
You can find examples in the tests directory, file test_network.py
.
@borisiskra, I believe the issue you raised is a bug and is not timeout-related. I suspect a parser issue and need to debug it. I will open a separate issue after confirming this is unrelated to the timeout problem @ivbeg opened.
Thank you for finding this problem. I will do my best to resolve it as soon as possible.
Network timeout issue addressed in robotspy 0.11
Something similar happens with this robots.RobotsParser.from_uri("http://22-lr.forumactif.com/robots.txt”) but I can download the robots.txt file and if I use: robots.RobotsParser.from_string(robots_downloaded_file) it also hangs forever. Note: the file size is 20624 chars
Fixed in robotspy
version 0.12 (see issue #212)
Thank you, @borisiskra, for raising this problem ✨
Hi! Sometimes hangs forever since no timeout. Tried robots.RobotsParser.from_uri("https://earthworks.stanford.edu/robots.txt") and it hangs. Default 10 seconds timeout should be good enough or it could be great if timeout parameter could be provided to the from_uri function