coralnet / pyspacer

Python based tools for spatial image analysis
MIT License
6 stars 2 forks source link

Add timeout to URLStorage's urlopen() usages #90

Closed StephenChan closed 5 months ago

StephenChan commented 5 months ago

I noticed that the URLStorage test_exists() test would sometimes hang indefinitely on my local machine. Turns out it was happening on UNREACHABLE_URL, an undefined coralnet URL which is supposed to return 404. Seems that was happening because urlopen(), used in URLStorage's exists(), doesn't define a timeout by default. And of course, the coralnet site has been known (especially recently) to be unresponsive for minutes at times.

I don't know whether the deploy-API jobs running on Batch had timeouts or not, but for what it's worth, I haven't seen any TimeoutError logged in the past couple of months (i.e. since urlopen() replaced wget here in production). One would think a lack of timeout there could endanger our Batch resources. I'm not sure what are the chances that could also propagate to endangering the web server's responsiveness; it might.

What I do know is that my local environment seems to benefit from an explicit timeout in URLStorage's urlopen() usages, and that it wouldn't hurt to have for production as well. So that's what this PR adds.

I took some time to mess around with Python sockets to confirm the semantics of the timeout kwarg. It seems that it gets a URLError if the initial part of the response doesn't arrive by the timeout, and it gets a socket.timeout (superseded by and aliased to TimeoutError in Python 3.10, which is coralnet's Python version) if there's a continuous idle period surpassing the timeout in the middle of sending the response. It is not a timeout enforced against the entire duration of the response, so if there is just a lot of data to send over a low-bandwidth connection, then that's not a concern.

URLStorage test updates in the meantime: