layers: allow mount in ro stores

giuseppe commented 3 weeks ago

There is nothing conceptually wrong in this as the Mount itself does not modify the store.

newFileGetter() works today with overlay only because it uses the drivers.DiffGetterDriver shortcut that reads files directly from their storage.

graph drivers that do not allow that, need to mount the layer instead. composefs could be changed to use the same code as overlay, but it requires more refactoring to propagate the TOC information down to the overlay layer. For now just fix mounting from a ro store, as newFileGetter() already assumes anyway.

openshift-ci[bot] commented 3 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/containers/storage/blob/main/OWNERS)~~ [giuseppe] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

giuseppe commented 3 weeks ago

the current code just crashes at the moment, because mountsLockfile is nil in an additional layer store.

I think the long term fix is to teach overlay how to look up files in the composefs store without having to mount the layer ( we need to propagate somehow the TOC to the overlay layer so it can lookup the file digest).

The current PR is just a band-aid to get composefs tests to pass in the Podman CI: https://github.com/containers/podman/pull/22888

mtrmac commented 3 weeks ago

I’m rather opposed to band-aids that introduce locking problems, and/or incomplete/inconsistent behavior involving locks. I don’t think there will ever be a more convenient time to get that fixed; so if we are serious about locking, we must be serious now. And I think we must be serious about locking.

If the code currently crashes, replace that with cleanly returning an error; on all similarly relevant code paths. That’s a clear improvement. Then we can see about any other improvement.

giuseppe commented 3 weeks ago

looking at the issue:

    // LOCKING BUG: This is reachable via store.Diff → layerStore.Diff → layerStore.newFileGetter
    // (with btrfs and zfs graph drivers) holding layerStore only locked for reading, while it modifies
    // - r.layers[].MountCount (directly and via loadMounts / saveMounts)
    // - r.layers[].MountPoint (directly and via loadMounts / saveMounts)
    // - r.bymount (via loadMounts / saveMounts)

shouldn't these just be protected by mountsLockfile as they are?

mtrmac commented 3 weeks ago

It can’t currently be mountsLockFile because (related to this PR) read-only stores might not have that lock at all, and/or might not be able to lock it for writing.
“load” is a write to the in-memory state. I.e. holding a cross-process read lock, we need an in-process write lock. I don’t immediately see that using the same lock for both is desirable.

There’s a kernel of an idea here — maybe we could make the in-memory storage more granular, with some fields covered by a generic layer store lock, and some other fields only by a mount lock. (I would worry about the two getting completely out of sync, e.g. WRT layer existence.)

But, fundamentally, layerStore.Diff is currently documented to only require startReading, which means it is potentially concurrent with other operations (including concurrent Diff calls), and outright not allowed to modify any layer-store state at all. It’s “easy” to change that, but I vaguely remember holding a shared-lock here being important for performance.

giuseppe commented 3 weeks ago

It can’t currently be mountsLockFile because (related to this PR) read-only stores might not have that lock at all, and/or might not be able to lock it for writing.

I don't understand why it would be a problem. If the mountsLockFile file is not usable, then it must not be possible to mount an image (the code in the PR returns an error now: https://github.com/containers/storage/pull/1944/files#diff-04662d0ee53d0732912a23987c7631b5d07aa429ea69c18e1da00903a6362904R1580-R1581.

We can improve it further by teaching DiffGetter how to use composefs. Once that happens, there won't be need to use mountsLockFile, as we can access the files in a layer without having to mount it, as it happens now with overlayfs. But until then, I think it is fine to use the naive approach and mount the image to access its files and require a writeable mountsLockFile to do so.

I think we already have a granular enough locking mechanism now, where mountsLockFile protects only the mount information. Anyway the mount.Mounted() function is the ultimate truth whether a mount exists or not, since the process could have been killed between performing mount/unmount and updating the mounts info. This was the cause of a long term Podman race condition, where the podman cleanup process is terminated before it could update the mounts info.

mtrmac commented 3 weeks ago

It can’t currently be mountsLockFile because (related to this PR) read-only stores might not have that lock at all, and/or might not be able to lock it for writing.

I don't understand why it would be a problem. If the mountsLockFile file is not usable, then it must not be possible to mount an image

An example of the relevant race is between layerStore.Mount changing layer.MountPoint vs. layerStore.Get making a copy of the string. A race here (e.g. a read in the middle of a torn write) results in undefined behavior. I think you’re right that those races are not reachable if mountsLockFile is not usable (assuming we rigorously document that the fields driven by mountpoints.json are never modified if the lock is nil).

But, also…

The ABI is such that a store.Layer.Mount{Point,Count} always exist, we don’t get to declare a value “unavailable”. Sure, that’s a mistake, and any out-of-c/storage consumers of those values are unfixably racy, but there are such consumers in Podman, and we need to somehow deal with them.

So far, the design has been that only layers in the primary store get mounted; for the other stores, the layers would have no mount point and zero mount count, and that’s AFAICT consistent and correct (unless a primary layer store in one process is a secondary layer store in another process — the latter one wouldn’t see counts).

If, after this PR, the RO layer stores can be mounted, but that data is maybe, without any reporting ( https://github.com/containers/storage/pull/1944#discussion_r1625927036 ) available or not … what does that mean in consumers?

And ”mountsLockFile is not usable” is not necessarily a system-wide consistent value: the file might exist, but some processes might fail to use it, e.g. due to SELinux.

Anyway the mount.Mounted() function is the ultimate truth whether a mount exists or not, since the process could have been killed between performing mount/unmount and updating the mounts info.

Because we reference-count unmounts, mount.Mounted() is not sufficient, but, plausibly, we could do the reference counting without involving inProcessLock — if we didn’t expose that data in Layer.

Now, I agree that the data in Layer.Mount* should, in principle, not exist, but I also don’t see a short-term path for avoiding that. The “easy” start for that change would be to remove all consumers of those fields, and that already looked non-obvious IIRC. I don’t know how we would handle correctness across multiple versions of c/storage e.g. during an upgrade or if Podman/Buildah/CRI-O get out of sync. (Also https://github.com/containers/storage/pull/1944#discussion_r1626442118 )

containers / storage

layers: allow mount in ro stores #1944