fermitools / poms

The Production Operations Management System (POMS) is a project designed to provide a service to assist production teams and analysis groups of experiments in their MC production and DATA processing.
https://github.com/fermitools/poms/wiki
2 stars 0 forks source link

Display ongoing outage information prominently #10

Open retzkek opened 1 year ago

retzkek commented 1 year ago

If there's an ongoing outage/degradation for jobsub/batch system, it would be very useful to users to display that somewhere prominently on the POMS interface, perhaps a banner.

Outage information is in SNOW and Landscape, I'd probably want to put it into Lens to provide a consistent API. In any case it should just be a GET request away.

marcmengel commented 1 year ago

I should note that in the early designs, you could (manually) hold (queue) all POMS launches for job submission or storage major outages, and then release them all later when things woke up; I think that code is still lurking in the corners somewhere; we could have whatever watches for outages and reports them also hold POMS launches until the outage is resolved, if the outage would otherwise make job launches fail...

https://github.com/fermitools/poms/blob/main/webservice/SubmissionsPOMS.py#L983