If a job is started for a URL that fails repeatedly, e.g. an inexistent domain or a host that's currently timing out, grab-site never stops retrying it.
This should instead be return verdict. accept_urls is always called, even if the wpull-internal filters already decided that a URL shouldn't be grabbed. In this particular case, that would be due to the TriesFilter, which matches once the URL has been tried three times (because grab-site passes --tries 3 to wpull). But the hook always returns True for the initial URL, so it's retried indefinitely.
This is similar to #129 but broader in scope.
If a job is started for a URL that fails repeatedly, e.g. an inexistent domain or a host that's currently timing out, grab-site never stops retrying it.
The problem lies here:
https://github.com/ArchiveTeam/grab-site/blob/5e75c56a7d6ee405083b2f0c3534d67b2208edd8/libgrabsite/wpull_hooks.py#L355-L357
This should instead be
return verdict
.accept_urls
is always called, even if the wpull-internal filters already decided that a URL shouldn't be grabbed. In this particular case, that would be due to theTriesFilter
, which matches once the URL has been tried three times (because grab-site passes--tries 3
to wpull). But the hook always returnsTrue
for the initial URL, so it's retried indefinitely.