flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
159 stars 49 forks source link

content-sqlite: treat ENOSPC as a transient condition #6010

Open garlick opened 1 month ago

garlick commented 1 month ago

Problem: when the file system containing statedir on rank 0 fills up, content-sqlite propagates errors back to the content cache which may not handle it well (#5978).

Potentially content-sqlite itself could navigate this situation without ever returning errors to the content cache.

garlick commented 4 days ago

Some investigation is needed on if/how sqlite can recover from this, starting with recreating the situation a few times and cataloging the failure modes.

Other random thoughts