flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 49 forks source link

content: flush data if backing store is loaded at a later time #6182

Open chu11 opened 1 month ago

chu11 commented 1 month ago

presently in the content cache, if a backing store is not configured, a cache store requestreturns ENOSYS to the caller. The entry is stored in memory and never retried to the backing store. This is even the case if the backing store is loaded/configured at a later time.

The entry should be tracked and retried at a later time if a backing store is loaded.

May be supported in conjunction with #6010.

garlick commented 1 month ago

presently in the content cache, if a backing store is not configured, a cache store request returns ENOSYS to the caller.

I think you meant a content.flush request returns ENOSYS.

We should probably discuss what we need to support here. The earlier design allowed backing stores to be unloaded and reloaded. In fact you could unload one type and load another type and everything would work because we forced blobs in the first store to be loaded into the cache during the unload process. That was interesting during early prototyping but really wasn't a practical use case as it risks ooming the broker. When the content cache code was refactored, support for it was dropped. I was thinking at that time that we would never unload a backing store module.

One question is whether we now have a use case that motivates supporting that. On recovering from a corrupt sqlite database. we could add something to the content-sqlite module to perform some recovery process while loaded, holding load/store requests in the mean time. Or to temporarily close the db for manual recovery and resume later. IOW the backing module could remain loaded and the cache may not need to deal with this case.

I'm a little concerned about making the cache any more complicated so wanted to raise that possibility.

chu11 commented 1 month ago

I think you meant a content.flush request returns ENOSYS.

Nope, I was specifically thinking about cache.store, b/c in the continuation...

    if (content_store_get_hash (f, &hash, &hash_size) < 0) {
        if (cache->rank == 0 && errno == ENOSYS) {
            flux_log (cache->h,
                      LOG_DEBUG,
                      "content store: %s",
                      "backing store service unavailable");
        }
    else {
            flux_log (cache->h,
                      LOG_CRIT,
                      "content store: %s",
                      strerror (errno));
        }
        goto error;
    }

so I think the caller gets ENOSYS if the backing store isn't there.

We should probably discuss what we need to support here.

I think the main thing is if the backing store isn't loaded (it is temporarily unloaded to repair something, or if it is setup after the content cache is loaded, etc.) we don't want the cache.store requests to never be backed up. That is the current behavior. We'd like it to eventually back up the data.

Without thinking about it too deeply, I would think atleast some stuff has to be handled in the content cache level. At the barest minimum, the backing module may not know about certain cache.store requests if it was temporarily not loaded?

garlick commented 1 month ago

Nope, I was specifically thinking about cache.store, b/c in the continuation...

Ah sorry, I knew we supported running w/o a backing store e.g. t0028-content-backing-none.t, but I had forgotten that the reason that works is the store requests are never initiated from rank 0 if there is no backing store.

I think the main thing is if the backing store isn't loaded (it is temporarily unloaded to repair something, or if it is setup after the content cache is loaded, etc.) we don't want the cache.store requests to never be backed up. That is the current behavior. We'd like it to eventually back up the data.

Right, I'm asking if we need to support those situations. To survive taking the backing store offline for repair, you not only have to write out dirty blobs on restarting, but also pause any blob faults. My point was that case may be easier to handle within the backing store module itself where you could just pause all load/store requests while repair occurs.

The "whoops I thought I could run with no backing store but now I'm out of memory and want one" case is not super likely IMHO. There's not much benefit to running without a backing store.

chu11 commented 1 month ago

My point was that case may be easier to handle within the backing store module itself where you could just pause all load/store requests while repair occurs.

Initially it would be easier to handle at the content cache level. The main reason being that we would have to implement "special handling" within each backing store module. i.e. perhaps it's easier to handle repair in sqlite when it is "offline", so the admin just unloads the module, does manual repair, and reloads it. Similarly for the files backend and perhaps future backing modules?

But thinking about it for 10 more seconds, perhaps adding some type of "puase" and "unpause" RPC target for each backing module wouldn't be that much trouble.

The "whoops I thought I could run with no backing store but now I'm out of memory and want one" case is not super likely IMHO. There's not much benefit to running without a backing store.

Yeah, now that you mention it perhaps its a case not worth dealing with.

garlick commented 1 month ago

If we go that route, we should think about making it an optional feature or even a sqlite specific feature since the extra work in the other backing stores that aren't really used much is probably not justified.

But plumbing for repair might not be super helpful if we don't have a strategy for recovering a damaged sqlite db. Any progress determining the failure modes and remediations when ENOSPC is encountered at different points during a db transaction?

chu11 commented 1 month ago

But plumbing for repair might not be super helpful if we don't have a strategy for recovering a damaged sqlite db. Any progress determining the failure modes and remediations when ENOSPC is encountered at different points during a db transaction?

That's on my todo for this week :-)

chu11 commented 1 month ago

But plumbing for repair might not be super helpful if we don't have a strategy for recovering a damaged sqlite db.

Just a side note given conversation in #6193. I had forgotten content-sqlite/kvs garbage collection was done offline. So ability to recover w/ sqlite ENOSPC errors while flux is running is somewhat limited.

I suppose hypothetically some kvs/backing-store garbage collection could be done online if we did some big "pause everything for a bit"? But that's a pretty big thing.

garlick commented 1 month ago

I don't think garbage collection would be effective in the kind of situations we've seen - terabyte log files, huge crashdumps. In the log file case, any space we free up would just be consumed.

I don't see freeing up space in this situation as our job. But not falling apart would be good.