Closed paddyroddy closed 5 months ago
Hi @paddyroddy thanks for reporting this! We've had timeouts on HTTP requests hardcoded from #329 but I think the value we used was too strict (5s). It would be good to change all of these to use a package variable and set it to a larger number (30-60s maybe).
I'm updating the title of this issue to match. I won't have the time to do this soon but if anyone else is interested, then please be my guest š
Another idea for implementation of this one would be to not make the timeout
parameter configurable, but instead make the entire requests.Session
parameter configurable. It is the one that controls the timeout, but also many more. This could save us work on a lot of future feature requests that relate to how HTTP requests are executed.
@leouieda can you expand on what you mean by a "package variable"? I'm annoyed with my tests constantly failing, so pretty keen to get this fixed
I've had a look into the requests.Session
approach, but it's kind of tricky with how downloaders.py
is currently written. My thought was you could replace **kwargs
with a session
object. But then requests are made elsewhere in doi_to_url
, ZenodoRepository.api_response
, FigshareRepository.api_response
, DataverseRepository._get_api_response
- where a separate requests.Session
object would have to be passed. I think instead I will try to set a global timeout
variable for the package.
I'm still unsure about a "package variable". I've noticed several functions/classes have a **kwargs
option to pass to requests
, which would include timeout
(or at least you'd expect that). So would it make much sense to have **kwargs
as well as timeout
?
For now, I'm going to try the retry_if_failed
option to pooch.create
and hope that that works well enough for me.
retry_if_failed
still seems to fail with the timeout
error. Any help on this issue would be great.
Hi @paddyroddy sorry for the delay. I meant having a global variable in a module somewhere that is used throughout the package. We can set this to something relatively high to help with flaky connections. But being a global also means it can be patched in an emergency.
I agree that refactoring how we pass the Sessions
will be a much larger problem. Ideally, we'd have foreseen this issue when we first wrote all that code. But doing this now doesn't seem like a good solution. I'd rather wait until there is a good use case for configuring the entire Session
to motivate putting in the work.
I agree that refactoring how we pass the
Sessions
will be a much larger problem. Ideally, we'd have foreseen this issue when we first wrote all that code. But doing this now doesn't seem like a good solution. I'd rather wait until there is a good use case for configuring the entireSession
to motivate putting in the work.
Agreed. I think Session
seems like a nice way to go in future, but non-trivial to implement without breaking current stuff. Think this compromise works well, thanks!
@paddyroddy v1.8.2 is now on PyPI and should be on conda-forge by tomorrow at the latest. Hope this helps!
@leouieda I've only done a few tests runs of my CI but looks like it's done the trick! Thanks š
@paddyroddy great to know! Feel free to open another issue or reopen this one if it comes back.
Does seem like I occasionally hit that timeout, damn Zenodo - it didn't use to be a problem
š¢ yeah, they've been a bit slow recently. If it's sample data for a package, I tend to mirror it on GitHub and make CI download from there instead to avoid overloading them.
Prior to the zenodo upgrade last year I had no issues with downloading my zenodo dataset. It has since been flaky and I sometimes get an
error like this
The relevant code is something like this
Is it possible to increase the timeout of
load_registry_from_doi
?