Open hmpf opened 4 years ago
These objects should be get_or_created early and easily and often. Maybe a management command named "setup" or "verify" or something, or get_or_created on each use of the function.
The first use of the function could be when setting up the system for the first time, a "Hello, World" incident, low severity!
There now is a way to auto-create an argus user/source/source type. What's left is to create a suitable incident every time argus complains about something in its logs.
This feature sounds very useful!
Still, I am a bit worried that, in certain cases, Argus might overload with its own error messages if this were implemented naively. Where an error triggers an incident, which matches a filter, is sent out by mail, which causes another error that triggers another incident, ad infinitum: Congrats on DoSing yourself and/or taking a whole Argus instance down.
So, two requirements that should be met before implementation 1) Needs a clear, exhaustive, written spec which errors can cause incidents and which do not. 2) A mechanism to prevent choking on its own incidents. Some filtering message queue, or another mechanism for rate limiting.
Removing "good first issue" tag for aforementioned issues. The actual code change may be easy to make, but it seems wise to reduce the threat vector a bit before tackling an implementation.
Another approach would be sending a notification through Argus without creating an incident. Details to be discussed later.
This came up again in #760.
The types of errors that argus can report about itself, for instance: Failure to send a notification because the notification-endpoint isn't answering (email server down, say), should be reported as a incident.
This means we need a SourceType "argus" and a SourceSystem representing the host argus is running on. Named "self" maybe? "me"? I suspect hostname would be tricky. Also, a function/method argus can use to write to the incidents-table, with SourceType/SourceSystem locked.
(This is very nice, because we can dogfood the system using itself, triggering errors in argus in order to have incidents turn up in argus :) )