Open grondo opened 5 years ago
Sounds good!
Might want to add debug.*
like in the primary eventlog. That turned out to be kind of handy.
This may be useful for logging PMI timing info, e.g.
debug.pmi.init.entry
debug.pmi.init.exit
debug.pmi.barrier.entry
debug.pmi.barrier.exit
etc.
No need to specify that, just pointing out a use for debug entries.
RFC 16 requires that the exec system create an
exec.eventlog
under the guest KVS namespace for use by the job shells.We should document a minimal set of required events for this eventlog. Though the job shell is a user replaceable component, it should perhaps adhere to some minimal standard behavior so that tools and apps may synchronize and interact with conforming job shells in a predictable manner.
Also, the exec system itself could probably dump some events of its own into this eventlog, e.g. an initial
init
event to denote creation of the eventlog, and a finaldone
orend
event as the terminating event.cleanup.start
andcleanup.finish
events might also be useful to indicate when cleanup tasks on ranks were started and completed (though maybe this doesn't belong in the user-level eventlog?)To start, we could define these well-known job shell events:
starting
- initial event logged by distributed job shellrunning
- all job shells have started execution of all current tasksstopped
- all tasks are stopped (e.g. waiting for debugger attach)exit
- one or more tasks have exited (context could include aggregated exit codes), there may be multiple exit eventscomplete
- all tasks have exited (final entry from distributed job shell)