allow relative paths for host volumes

ibukanov commented 9 years ago

From a deployment point of view it would be nice to support relative paths in the host volume specification that will be created in an implementation-defined root to avoid hard-coding the layout on the host. One can argue that even with absolute paths nothing prevents the implementation from using an extra chroot to store volumes, but it would be nice to explicitly allow that in specs.

jonboulle commented 9 years ago

@ibukanov Hmm, the whole point of the three-tiered volume system at the moment is to avoid hard-coding layouts; the idea is that an executor reifies the volume<->mount mapping locally. Do you think we can close this out in favour of the discussion in #364?

ibukanov commented 9 years ago

I do not see how this is relative to #364 which talks about mount points inside containers while this is about placing containers on the host.

The current specs talks about host volumes with absolute paths on the host. As I understand the intention is that those volumes are independent of any pod/container and should be kept even after the pod is removed. Moreover, they can be used by several pods. Compare that with empty volumes that can be removed after the last container that uses them is removed and that are inaccessible from different pods.

The problem is that by insisting on absolute host paths for such shared persistent volumes the specs causes to hard-code volume paths on the host. What is desirable is to allow to specify a persistent volume that several pods can use without specifying a host path.

Supporting relative path against an implementation-defined location is a simplest way to do that. An alternative is to have a third volume kind like shared where the path specifies an abstract location mapped to a directory on the host in implementation-defined way.

jonboulle commented 9 years ago

The problem is that by insisting on absolute host paths for such shared persistent volumes the specs causes to hard-code volume paths on the host

This is the part I don't get, can you please point out where you're seeing it hard-coded? An example would be helpful. Thanks!

ibukanov commented 9 years ago

@jonboulle Currently the spec requires that for the volume kind host its source must be an absolute path like /opt/tenant1/work. Thus the pod file needs to assume a particular filesystem layout on the host. It would be nicer if the source can be set just to, say, tenant1/work and let the implementation to decide where to put that. The only restriction is that if another pod also specifies tenant1/work, then it should refer to the same directory on the host.

jonboulle commented 9 years ago

@ibukanov The expected flow here is that the ACE (which is running on a particular host and aware of that host's particular filesystem layout) late-binds the volume source. Until the point that the manifest is being executed, it doesn't even have that volume specification present. I concede that this isn't as clear as it should be today (we aren't effectively demonstrating pod templates vs. reified manifests). But does that make sense/serve your use case?

ibukanov commented 9 years ago

@jonboulle Now I see how late bindings solve the issue. Clearly, it would be nice to clarify that.

However the late bindings immediately rises the question what if I want to refer to the same volume from another pod? Is the answer that an administrator should rely on unspecified tool to setup that volumes in 2 pods point to the same directory on the host? Perhaps in that case it is better to drop the volume specification from the pod manifest and just state that the mapping of volume names to host directories is host and deployment-specific?

jonboulle commented 9 years ago

However the late bindings immediately rises the question what if I want to refer to the same volume from another pod? Is the answer that an administrator should rely on unspecified tool to setup that volumes in 2 pods point to the same directory on the host?

Well yes, I'm not sure what the alternative is? I can't see any sensible way for this to be expressed in the spec

Perhaps in that case it is better to drop the volume specification from the pod manifest and just state that the mapping of volume names to host directories is host and deployment-specific?

Maybe, but then arguably we should remove them from the imagemanifest too (#364). The point of them being in the reified pod manifest today is that it provides a definitive document of the pod's execution environment. Anything defined in the spec but overridden by the executor (environment variables, execution parameters, ...) is recorded in the podmanifest.

ibukanov commented 9 years ago

The host volume source is meaningless as on modern Linux an absolute path requires a mount namespace reference to make any sense. This is rather different from execution parameters or variables as those applies to the container itself, not the host. Plus the current separation of volumes into host/empty is too restrictive. What if the executor uses periodic GC to remove no longer used volumes based on time/disk usage or some administrator-defined policy?

So I think it is really best to remove the volume bindings from specs and let the executor to manage them as it wants. If executor adds volumes and mounts beyond those in manifest, the reiffied manifest should just list those using autogenerated names or perhaps ids without specifying where and how they are represented on the host.

As for #364, my preference is to eliminate the notion of mount point names and to allow to bind directly volumes to paths in the container.

jonboulle commented 9 years ago

The host volume source is meaningless as on modern Linux an absolute path requires a mount namespace reference to make any sense

Well presumably "host volume" => "host mount namespace" (or technically "mount namespace in which the executor is running" to be particularly pedantic). I certainly agree that it is not particularly useful from within the pod itself to introspect this. But for auditing purposes etc, it still seems to have utility to know that "when this pod ran, it had this volume from its source runtime context"

Plus the current separation of volumes into host/empty is too restrictive. What if the executor uses periodic GC to remove no longer used volumes based on time/disk usage or some administrator-defined policy?

I really don't understand why this is too restrictive, or part of the remit of the spec at all; in my eyes it's simply co-ordination to be handled by the implementation

If executor adds volumes and mounts beyond those in manifest, the reiffied manifest should just list those using autogenerated names or perhaps ids without specifying where and how they are represented on the host.

At this point the volumes no longer seem to have any meaning to me; I would be more inclined to remove them entirely.

ibukanov commented 9 years ago

@jonboulle said:

At this point the volumes no longer seem to have any meaning to me; I would be more inclined to remove them entirely.

Unless I missed something, the notion of the volume name is the only way in pod to declare that mount points in different containers points to the same storage.

appc / spec

allow relative paths for host volumes #376