Open legoktm opened 1 month ago
@legoktm I'm happy if you want to keep it self-contained here, I'm just not sure about the most elegant way to do that (since right now, I think all the slack notifications happen inside the VM, and in this case, you probably want the check/alert to fire from the ESXi host).
Did you have any ideas around that? I think it basically would involve adding slack API request logic to the run.py
around the spot it throws the error.
Alternatively, indeed something we could do is add an Elastalert check for the vim.vm.guest.ProcessManager.ProcessInfo
that occurs in the log on error. It's not Icinga, but it's the shortest path to getting the alert detected and firing in the appropriate Slack channel. This might also capture other errors than just nightlies. It's also probably better than Icinga in that regard as there isn't really a 'recovery' state to return to (except on the next night) - it'd just alert whenever there's a problem.
So that's a very quick solution I can put together, but it does mean it's not as 'self contained'.
Let me know your thoughts.
Related to https://github.com/freedomofpress/securedrop-workstation-ci/issues/77 - we should have monitoring that the
run.py --version 4.2 --update --save
process succeeds.@mig5 would this be something you'd like to integrate into icinga? Otherwise we could keep it self-contained to this repo and have it send slack messages if the process fails.