Open chu11 opened 1 month ago
oops ... wrong place. moving to flux-core
Maybe we could track the elapsed time for each transaction and implement a dynamically configurable timeout?
I looked at a core dump to see if I could glean any extra details but eventually gave up b/c there's too much implementation specific stuff in jansson (need a container_of()
and then there's an internal hash table for storing all keys, etc).
The interesting bit is that the above was hanging on an eventlog
key. I don't know if we restarted the flux-broker
before https://github.com/flux-framework/flux-sched/pull/1250 and https://github.com/flux-framework/flux-core/pull/6115 were fixed. That might explain the possible gigantic eventlog??
No the extra events only went to the journal not the KVS.
Recently a large job put a heavy load on the KVS and the KVS was unable to make progress
it would be useful if there was some way to "kill" a currently processing transaction that is slowing down the KVS.
Note that in some cases this kill may not work, like if a kajillion entry array was passed to the kvs.