Open hannahwhy opened 9 years ago
One way to do this might be to start jobs on multiple pipelines on (e.g.) different ASes, but the current job dequeuing setup doesn't allow that.
Perhaps some sort of prefetch check task might work: if the first few responses are 4xx or network timeouts, the job is failed and put back into the queue.
People have shoved enough stuff through ArchiveBot that we're now running into a problem where certain high-usage fetch nodes are being banned from large website providers. One of them appears to be banned from all Squarespace sites.
There's a similar problem with nodes that are started on heavily filtered networks. There's some protection against this with the
CheckIP
task, but we can't cover all the bases there. For example, we had a node in Singapore that would have been unable to grab anything that fell under the censorship list of the Singapore Media Development Authority.It would be nice to identify when it looks like a node cannot complete a job due to these conditions and send out an alert. With suspend/resume it would then be possible to move that job to a different node.