alexz-enwp / wikitools

Python package for working with MediaWiki wikis
105 stars 51 forks source link

Allow finer control of API errors in client code. #43

Open eggpi opened 7 years ago

eggpi commented 7 years ago

Hi! Citation Hunt uses wikitools on its (batch) jobs that process Wikipedia articles, both for fetching the article's text and converting wikicode to HTML.

I was bit by the fact that wikitools will sometimes retry forever upon API errors, which was, interestingly enough, mitigated by another hack I have in place for persisting connections (I'll file a separate issue for that later).

For the purposes of Citation Hunt's batch jobs, I don't particularly care if a single request fails, and I can just as well move on to another one, so it would be nice to have that kind of control. Maybe this could just raise an exception for the client to handle cleanly?

In addition to that, for tools that perform live queries against the API to serve requests, it would be nice to have some control over maxlag errors. wikitools will currently time.sleep when one happens, but if an exception was raised instead, the client might be able to handle it some other way. The sleep could be made optional for compatibility and sane out-of-the-box behavior.

What do you think? I'll be happy to try my hand at this, once we agree on the changes to be made :)

mzmcbride commented 7 years ago

Hi. Glad to hear that wikitools is useful to you.

I guess this issue is specifically referring to this FIXME: https://github.com/alexz-enwp/wikitools/blob/b71481796c350/wikitools/api.py#L309? If so, I think this section of code is appropriately labeled (that is, it is dumb behavior that we should improve) and we would welcome any pull requests that make the behavior better/smarter.

Maybe the query method could have a retries=2 default argument? I think retrying once is fine, but if the site is truly down/broken, we obviously don't want to keep hitting it forever. I'm also fine with raising an exception and having clients catch and handle that.

In that area of code, https://github.com/alexz-enwp/wikitools/blob/b71481796c350/wikitools/api.py#L306 also seems kind of janky to me, but perhaps that should be looked at separately.

eggpi commented 7 years ago

Thanks for the quick response!

I guess this issue is specifically referring to this FIXME: https://github.com/alexz-enwp/wikitools/blob/b71481796c350/wikitools/api.py#L309?

Pretty much, though I'm also interested in raising exceptions without retrying for (presumably) transient errors, such as maxlag (the comment only mentions things that are "never going to work"). Making retries a parameter, as you suggested, would take care of that too.

Maybe the query method could have a retries=2 default argument? (...) I'm also fine with raising an exception and having clients catch and handle that.

Sounds good to me. A quick look suggests that urllib3 also does retries + exceptions by default, which is nice.

In that area of code, https://github.com/alexz-enwp/wikitools/blob/b71481796c350/wikitools/api.py#L306 also seems kind of janky to me, but perhaps that should be looked at separately.

I suppose the common theme here is sensibly raising API errors as exceptions to client code - be it maxlag being reached, disabled API, or whatever else - so they can potentially handle it. In that sense, it might belong in the same PR too.

I guess we're on the same page overall, so I'll try to come up with a PR tomorrow and hopefully we can work out the details there. Thanks again!