Uninett / Argus

Argus is an alert aggregator for monitoring systems
GNU General Public License v3.0
18 stars 13 forks source link

Allow a source to delete its incidents in API #805

Closed hmpf closed 4 months ago

hmpf commented 4 months ago

Closes #804

Missing/will we need it/to decide:

  1. Should there be an event to mark the deletion of an incident? It cannot be on the incident to be deleted.
  2. Should there be a hard time window where deletion is allowed? What should the settings-name be to set the duration?
  3. Should there be a field on the incident that says deletion is allowed? What should the field be named? What should the default be?
  4. Depends on queue: If there is a field on the incident that allows deletion, should there be an agent that toggles this field after X seconds? (See no. 3 above)
  5. Should there be a setting that toggles this on or off? If off, deletion is not allowed.
codecov-commenter commented 4 months ago

Codecov Report

Attention: Patch coverage is 95.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 84.48%. Comparing base (f1320d8) to head (0fdc425).

Files Patch % Lines
src/argus/incident/views.py 95.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #805 +/- ## ========================================== + Coverage 84.38% 84.48% +0.10% ========================================== Files 76 76 Lines 3784 3802 +18 ========================================== + Hits 3193 3212 +19 + Misses 591 590 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 4 months ago

Test results

       7 files     588 suites   22m 4s :stopwatch:    467 tests    466 :heavy_check_mark: 1 :zzz: 0 :x: 3 269 runs  3 262 :heavy_check_mark: 7 :zzz: 0 :x:

Results for commit 02b0f7b1.

:recycle: This comment has been updated with latest results.

hmpf commented 4 months ago
  1. Should there be an event to mark the deletion of an incident? It cannot be on the incident to be deleted.

I have no strong feeling towards this

Me neither.

  1. Should there be a hard time window where deletion is allowed? What should the settings-name be to set the duration?

Is there already a use case defined where Argus wants to enforce this?

Nope.

  1. Should there be a field on the incident that says deletion is allowed? What should the field be named? What should the default be?

I think this would add unneccesary complexity.

This is the phase-thing though. It's supposed to blink and whatnot before it is no longer legal to delete.

  1. Should there be a setting that toggles this on or off? If off, deletion is not allowed.

perhaps?

I think in general when exposing delete functionality, it can be confusing if it sometimes is allowed and sometimes isn't. I'm wonding what would be the reason for disallowing deletion. My guess would be that it might have to do with leaving an (audit) trail of incidents, but that's just speculation on my part and if so, it'd be highly dependent on the context in which Argus is run. My feeling is that Argus should make as little assumptions about this as possible.

Currently, we don't allow deletion at all. Events are the audit trail of an incident and events depend (foreign key) on an incident. We've had no need for it whatsoever. So to maintain existing functionality (incidents are not deletable) we need at least this setting.

hmpf commented 4 months ago

We could also have that a toggle on a source that this source is allowed to delete but that can wait.

hmpf commented 4 months ago

I'll add a setting to turn deletion on and off. Then we won't change existing production systems.

johannaengland commented 4 months ago

I'll add a setting to turn deletion on and off. Then we won't change existing production systems.

I think that is a good way of handling the different use cases. That way the responsibility of handling deletion lies with that instance and we don't have to worry about it too much

lunkwill42 commented 4 months ago

Agreed that deletion should only be possible if turned on by a config option.

However, what we discussed at the last meeting is that the ability to delete incidents would be phase-dependent. I.e. if/when we introduce incident phases, it would make sense that an incident in the pending phase could be deleted. Once an incident passes to the finalized state, it would not be deletable (as much for auditing reasons as anything else, but it would be consistent with Argus' current design). In the current design, all incidents are seen as finalized, in practice, by Argus.

lunkwill42 commented 4 months ago

I would relate this to both #804 and #806

hmpf commented 4 months ago

Deletion now works even without hashing out phases so that coding and discovery can continue.

hmpf commented 4 months ago

This is now feature complete, no migrations needed.

hmpf commented 4 months ago

The way it is planned to be used (before we use it ourselves) the only event that will exist at the time of deletion is "INCIDENT_START" so acks won't be a problem. This is an interim thing, awaiting how we'll make "pending" phase possible.

johannaengland commented 4 months ago

The way it is planned to be used (before we use it ourselves) the only event that will exist at the time of deletion is "INCIDENT_START" so acks won't be a problem. This is an interim thing, awaiting how we'll make "pending" phase possible.

Yes, in that context it makes sense. I think we should still add it, for completion's sake. It does not sit right with me that I can produce an internal server error. We can remove that again when we add handling of the phases if you want to.

hmpf commented 4 months ago

Man, this is a reminder why I don't like Acknowledgement being a separate model. Optional field on Event would have simplified so much.

sonarcloud[bot] commented 4 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud