Closed chorsley closed 4 years ago
As @vr0al already correctly stated in #1216 EAI_AGAIN
/ -3
means Temporary failure in name resolution.
which is not a permanent error. So in this case you probably got some domains which permanently trigger temporary errors. As temporary failures are usually temporary and need to be fixed, just ignoring them is IMHO not the right way.
Detecting this kind of permanent temporary failures is a lot of work, so the realistic option would be to ignore these kind of errors and log a warning.
the bot raises an error, exits, waits 15 seconds, then restarts three times per non-resolving hostname. That creates a large pipeline bottleneck at the gethostbyname expert step when we're trying to process a large backlog of possibly non-resolving hostnames.
You can change this behavior: https://github.com/certtools/intelmq/blob/develop/docs/User-Guide.md#error-handling
Any opinions on what I wrote @chorsley ?
I suggest to optionally (by parameter, opt-in for backwards-compat) ignore this temporary error.
As I got no feedback I implemented it thay way now
On a new IntelMQ instance, we're currently processing the Phishtank feed. The feed data includes a lot of very old URLs back to 2017, and many of the hostnames in there no longer resolve (i.e. NXDOMAIN) as you'd expect.
In the gethostbyname expert,
socket.gethostbyname()
returns a -3 result for these. Since -3 is not included in the expected result codes at https://github.com/certtools/intelmq/blob/develop/intelmq/bots/experts/gethostbyname/expert.py#L41, the bot raises an error, exits, waits 15 seconds, then restarts three times per non-resolving hostname. That creates a large pipeline bottleneck at the gethostbyname expert step when we're trying to process a large backlog of possibly non-resolving hostnames.The easy fix would be to add -3 into line 41, i.e.
if exc.args[0] in [-2, -3, -4, -5, -8, -11]:
so that it bot could just move on smoothly without raising an error.This seems to have been discussed before (e.g. https://github.com/certtools/intelmq/issues/1216) without resolution. I'm tempted to just make this change myself since it's proving a large drag on processing performance, but am wondering if there's any reasons identified NOT to do it?