Open garlick opened 1 month ago
Some investigation is needed on if/how sqlite can recover from this, starting with recreating the situation a few times and cataloging the failure modes.
Other random thoughts
content-files
may be an easy back end to prototype a recovery behavior and write tests for. E.g. when it gets ENOSPC on write, unlink the object and queue store requests until there is free space.
Problem: when the file system containing
statedir
on rank 0 fills up, content-sqlite propagates errors back to the content cache which may not handle it well (#5978).Potentially content-sqlite itself could navigate this situation without ever returning errors to the content cache.