flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
160 stars 49 forks source link

consider preserving data from subinstances #6080

Open garlick opened 1 week ago

garlick commented 1 week ago

Problem: after a batch job completes, we lose the KVS content and hence all the information about what ran in the the instance.

We do have the flux batch --dump option

       --dump=[FILE]
              When the job script is complete, archive the Flux instance's KVS
              content  to  FILE,  which  should  have  a   suffix   known   to
              libarchive(3), and may be a mustache template as described above
              for --output.  The content may be unarchived directly  or  exam‐
              ined within a test instance started with the flux-start --recov‐
              ery option.  If FILE is unspecified, flux-{{jobid}}-dump.tgz  is
              used.

Should we consider enabling this by default?

We could consider garbage collecting it before writing it out.

Also maybe some tooling beyond flux start --recovery could be developed to allow the dump file to be queried.

grondo commented 1 week ago

Related #5952