bgptools / issues

The public issue tracker for bgp.tools
28 stars 8 forks source link

Displaying the logic behind an alert would be extremely helpful #124

Open athompson-merlin opened 7 months ago

athompson-merlin commented 7 months ago

URL:

https://bgp.tools/authed/manage-alerts?detail=d62ed4fa-2f2f-48d0-ab52-c909dff5da6e

Browser (User-Agent if you can):

n/a

Error base64 blob (If a site error):

n/a

What were you expecting:

Explanation of WHY this alert is being generated.  Alert terminology is not 100% standardized, and regional variations exist; as an example, a "hijack" was thought by someone I was talking to, to mean a ransomware-vectored takeover of an organization!  (They were familiar with the BGP hijack concept, just didn't know it had a name.  No, they don't read NANOG :-). )

Even I would appreciate my automated tools showing me the logical condition that just succeeded/failed that triggered this alert.  i.e. is a hijack alert based on RPKI, on IRR, on BGP, where was it seen, etc.  And if IRR, is it from ARIN, RADB, CANARIE, some other database...?

What happened:

Alert just told me "a hijack happened", told me the ASN and prefix involved, but no other details.  Obviously I've made a mistake somewhere, since I manage both ASNs involved, but I don't know where to look for my error.
benjojo commented 7 months ago

I wrote up mini KB pages for each alert type with extra info on how/why they are generated, for example: https://bgp.tools/kb/alert-help-hijack

These are now also provided as links on each alert details page:

image

I also added "mire detail" URLs to a handful of alerts where it now makes sense to do so

athompson-merlin commented 7 months ago

,That's helpful, but in the example I linked to, what I had in mind (put didn't put into writing, my bad...) was more that - great, there's a hijack for prefix X, great, and it's being done by AS X, great, and the new is is [ X ], great... but until I'm faimiliar with the system I have no clue what data you're using to trigger this alert, i.e. I don't know why I'm getting this alert in the first place.

I can't figure out from the alert that I manually added this prefix during setup, thereby telling your system that AS16796 should always be advertising it.
(I also certainly can't tell that I have to click on my ASN in the top-right corner, then the Monitoring tab, then Settings, then click "Edit" beside AS16796, then remove that prefix from the table, but that's a separate ticket about the UI/UX...)

athompson-merlin commented 7 months ago

Further in that vein... the detection logic obviously already looked up the relevant IRR and RPKI records and based on them decided NOT to suppress the alert... having those shown to me (hyperlinks would help) would let me understand that I forgot to edit the !@#$%^ RADB IRR record when I changed what router advertises this prefix.

I think exposing the "business logic" would be significantly more helpful than a "something is wrong" email.

athompson-merlin commented 7 months ago

I saw the new text in the latest alert, that's helpful, thank you!