Alert when a target host is down

This feature would be terribly useful, but not at all trivial to implement.

First, we have to define what's a "host down". @cr0hn proposed some time ago that any host, identified either by domain or IP, that ceased to respond N times in a row, would be considered "down", where "cease to respond" means TCP failures only (closed or filtered port, open port but timing out when reading the incoming stream, open port but timing out when writing the outgoing stream).

For this we'd need to hook all socket operations to find out the TCP port and host and catch all errors. This can be done by monkey-patching the socket library and catching socket exceptions as they occur. Then a specific message for this even would be sent to the Orchestrator, and after that the exception would be raised normally. The monkey-patching should only occur for sockets created by plugins and connected to hosts within scope - connections to external APIs and connections made by GoLismero itself should go undisturbed.

Second, we need to define what we want to happen when a host is determined to be down. In principle we need the hostname or IP, but also the TCP port, since a service may be down but other services on the same host may be up, and we probably want a separate alert for each service.

If we just send alerts, we run into the risk of flooding the user with alerts if a host is up and down intermittently. So it's probably better to temporarily disable all tests on a host after it's down. This means making changes to the AuditScope object, so we can mark a host/port combination as down, and it would translate that to scope checks for concrete objects in our data model (for example, if the HTTP service in a host is down, all URLs pointing to it would suddenly become out of scope).

This would also have an impact on other places in the code, as we're currently assuming the scope does not change during an audit. Probably we'll have to disable quite a few scope checks meant as optimizations, and databases may grow larger.

Another possibility is to have a different scope-like object, but not for checking if something is in scope or not, but for checking if we're allowed to connect to it or not. Currently both checks amount to the same thing, but with the host-down logic they would not. (There are also other circumstances where this distinction is desirable, see ticket #94). I like this option better than hacking AuditScope. However it could be rather confusing for users trying to write plugins.

If we also want to consider high-level DoS too, a possible solution is to expose an API for plugins to manually report a "host down" event. The downside of that is we're leaving too much work for the plugin developers. And since it'd be optional, we can't count on them actually using this API at all.

I don't think there's a truly generic solution for this, but maybe we could add some HTTP-specific logic, if we can think of any, for some specific circumstances.

We can't simply treat 5xx errors as a sign that the host is down, since we don't know the reason for the error - it could be something important like the HTTP server saying the webapp can't be run, or the webapp crashing because the database server is down... or it could be something unimportant, like a web programmer who though returning with code 500 if the user wasn't logged in was a good idea.

Maybe we can detect that a certain request was replayed, the old response was 2xx, and the new one became 5xx. It's not foolproof either, and the drawback is we'd have to wait for the exact same request to be made by accident (and in general we don't want duplicated requests in the first place!).

Yet another option is to implement some rudimentary checks as a way to flag a host as "possibly down". Then a probe of some kind could implement more intelligent tests. This would be rather complicated and hard to maintain, though, so I don't quite like it.

One more idea: some webapps leak exception tracebacks. If we can parse them, we can detect some specific errors. An example: most PHP webapps crash when the SQL database they connect to is down, and show a crude error message to the user. We could detect that to mark a host as down and stop the scan. It's simple, but a little too specific. Still, better than nothing! :)

There's also the problem of having multiple web applications on the same host - if one fails, do we stop scanning all the others? The answer is no, of course, but... how do we know for sure what URL belongs to which webapp? The mapping may not be trivial.

What I see as the best solution for high-level protocols is just sending some alerts when suspicious behavior occurs, but continuing the scan for those targets. Only for hard network errors we should stop the scan for a service that went down. This means we'll have to figure out how to avoid flooding the user with alerts - maybe with some time limits, a maximum number of identical alerts over a minimum period.

cr0hn / golismero-legacy

Alert when a target host is down #229