flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

idea: stream status updates from flux-resource or other tools #4792

Open chu11 opened 1 year ago

chu11 commented 1 year ago

Partly from a hallway conversation I had with @morrone.

I added a whatsup option called --monitor a long time ago. The idea is you run

> whatsup --monitor

at the end of the day, and it'll output things like "node123 (10/22/22 9:00PM): down" (can't remember the exact format, but that's the basics) when things go down/up during the night. You come in in the morning and you get a nice mini status in your terminal for what happened when you were gone.

Similar option could be useful with flux-resource or some other tools.

BUT, the additional benefit is that if we add this, the events stream that implements this underneath could then also be used as a more friendly events streaming service for #4569.

garlick commented 1 year ago

We do have a resource eventlog in the KVS that can be watched, e.g. in raw form:

$ sudo flux kvs eventlog get -w resource.eventlog
1669817412.474601 resource-init {"restart":true,"drain":{},"online":"","exclude":"0"}
1669817412.476516 resource-define {"method":"configuration"}
1669817414.620857 online {"idset":"0"}
1669914360.522472 online {"idset":"1"}
1669914360.672504 online {"idset":"2"}
1669914360.986879 online {"idset":"3-5"}
1669914361.136419 online {"idset":"6"}
1669914417.560495 offline {"idset":"6"}
1669914538.094525 drain {"idset":"7","reason":"testing drain","overwrite":0}
grondo commented 1 year ago

This has nothing to do with #4569 though, since that issue deals with job events. Were you thinking a utility or service that would aggregate all known eventlogs into a single event stream for a consumer? (would need to be instance owner only)

chu11 commented 1 year ago

Were you thinking a utility or service that would aggregate all known eventlogs into a single event stream for a consumer?

Ahh, I forgot that #4569 was job events specific. @morrone and I were talking about the potential of other event streams as well. Although it wasn't discussed aggregating them all into one, we were discussing just the general availability of them.