Open ijc opened 7 years ago
Looks like a decent proposal:
Copy(dst, src string)
, since we've followed that here in other contexts.Resource
needs a copyable method, but we need to push this down to the driver, so they can make efficient decisions. Ideally, we'd have a CopyTree
function (or some clever naming) at the root that can be used in common cases and then one can fall back to drivers for more complex uses (I am copying from VM remote to some specialized thing).Ack to your first two bullets.
Not sure I quite follow the third. By "I don't Resource
needs a copyable method" did you mean "I don't think Resource
needs a Copy
method" or "I don't think Resource
needs a Copyable
interface"?
I think perhaps you are saying both, since Resource
doesn't have visibility onto a Driver
. That implies we should add the copying for each resource type to the toplevel context.CopyTree
you propose which would then know how to use the drivers to duplicate each kind of Resource
?
Hrm, might we want different drivers for source (implied in the context) and destination (would need to be passed?) Perhaps destination should also be a(n empty) Context
?
Not sure I quite follow the third. By "I don't Resource needs a copyable method" did you mean "I don't think Resource needs a Copy method" or "I don't think Resource needs a Copyable interface"?
"I don't think Resource needs a Copyable interface"
Resource
is not supposed to be bound to an environment, although it may draw on resources, such as disk storage.
I'm not sure how tightly we should integrate with drivers. Perhaps, only as much is needed to avoid repeating logic in getting resources from a given environment.
I guess that would mean there is a version that takes paths and instantiates the environments (contexts) from paths and another that works solely from environments. Does that make sense?
I take it your use of "environment" assume #78 has happened and will try to interpret with that in mind and use that term throughout.
So I think you are proposing two functions. First a
func Duplicate(dstroot, srcroot string) error
That would build an Environment
(née Context
) for srcroot
and dstroot
and then call our other new function:
func DuplicateEnvironment(dst, src, Environment) error
Or should this be:
func (src *Environment) Duplicate(dst Environment) error
In either case this will then iterate over the Resource
s in src
and call their Copy
methods in turn.
My original proposal had those Copy
methods having this signature:
Copy(srcroot, dstroot string) error
Which makes it impossible to integrate with drivers at all. Maybe this is signposting me towards the idea that the duplication should be happening in DupicateEnvironment
(with a big type switch) rather than pushing down via a new Copy
method.
To do this you would need to be able to "readish" operations on src.driver
and "writeish" operations on dst.driver
. Since driver
is private in Environment
either one or both of those is going to be inaccessible in any particular method. The first option I see there is to add Driver
method to Environment
to expose it (or to make the member public). WDYT? I'm not immediately enamoured with that idea.
An alternative model just occurred to me:
func (src *Environment) Duplicate(dst string) (*Environment, error)
func (src *Environment) DuplicateWithOptions(dst string, options EnvironmentOptions) (*Environment, error)
i.e. take the dst
as a string and return a new Environment
referencing the newly duplicated tree. Now we have access to src.driver
since it is a method on src
and we can have access to the dst
driver because we are creating it and can hold onto it.
@ijc Yes, I am using "environment" because "context" is confusing in the context of the existence of the context package.
That proposal looks good. As far as driver access goes, I think it is reasonable to expose something on environment to allow implementations to access platforms directly for performance optimizations.
One other thing to consider is the behavior: do this "sync" (match the set) or "copy" (destination is now a superset of both src and dst). Are they destructive?
That proposal looks good.
There was at least 3 in https://github.com/containerd/continuity/issues/95#issuecomment-334187464 ;-) Do you have a favourite?
One other thing to consider is the behavior: do this "sync" (match the set) or "copy" (destination is now a superset of both src and dst). Are they destructive?
In the name of simplicity (and because I'm in inherently lazy) I'd prefer to mandate that the destination not exist at all upon entry for now. If we come up with a need to "overlay" src onto dst that should be doable in a backwards compatible way I think.
There was at least 3 in #95 (comment) ;-) Do you have a favourite?
There all my favo[u]rite! I guess I was saying the general direction looks good.
I think the focus should be on the top-level, path-based function Duplicate
(maybe CopyTree
or SyncTree
, we can bike shed it later). Pushing down some functionality to the environment seems like a good implementation detail.
I think the main implementation would be something like this:
func Duplicate(dst, src string) error {
return DuplicateEnvironment(local.New(dst), local.New(src))
}
Basically, its a helper for doing the easy thing and then you can use the other parts when something special is required.
However, I do see the benefit of making the "environment" one-sided, in that you sync to a local directory, but this may have limitations if you are syncing two environments that are only accessible from their drivers.
Stepping back, I may have incorrectly rejected one of the original ideas to have this on the manifest, because something like this makes a lot of sense:
func (m *Manifest) Apply(env Environment, provider Provider)
The above would let you apply the manifest to an environment, retrieving the content from the given provider, which might let you get direct access to fds for using things like CopyFileRange
. We may already have a concept similar to the above, however, it would be nice to avoid the creation of the manifest, as hashing the whole filesystem is expensive when your syncing to an empty directory tree.
@dmcgowan Any thoughts here?
it would be nice to avoid the creation of the manifest, as hashing the whole filesystem is expensive when your syncing to an empty directory tree.
I'm unclear if hashing is a necessary part of the hardlink stuff. The hardlinkManager
doesn't hash but the hlm.Merge
calls the package-level Merge
which does seem to pay some attention to the digests.
So it's not the Manifest
as such which causes all the hashing but rather the fact that it creates all the Resources
. While we might be able to avoid building the Manifest
I'm not so sure we can avoid creating Resource
objects, can we?
I think we certainly want to DTRT with hardlinks in this functionality.
I finally started looking into implementing this week (before I got distracted again)... It looks like continuity.ApplyManifest
already covers most of the use case here, in my copious free time I'll look into massaging it to suite my needs, likely by coercing ./bin/continuity apply
to be able to dup a manifest
+source tree
in the first instance.
One thing which seems to be missing is timestamps, adding that would require reving the protobuf descriptions I think. Need to think a bit harder about whether that is a requirement for my use case (copy up from underlying image into a volumes for cri-containerd) at first glance I think it probably is...
One thing which seems to be missing is timestamps
I think we need to add timestamsp, regardless. cc @AkihiroSuda
I know there was a PR for this.
Oh sorry for leaving the PR https://github.com/containerd/continuity/pull/88 for months.. I'll be work on the PR today
@AkihiroSuda I think we were undecided on #88 but I think it is clearly a necessity now.
My use case for this was to achieve the same thing which https://github.com/containerd/cri-containerd/pull/535 did much more simply.
Given that I no longer have a pressing need for this functionality. Probably this issue can be closed, unless you would prefer to keep it around for the design discussion should this need arise again in a different context?
Background
Following on from https://github.com/containerd/containerd/issues/1482#issuecomment-332784539
Docker style volumes (those declared with
VOLUME /path
in aDockerfile
) have the property that they are initialised using the contents/path
in underlying image. In order to produce a helper which can allow containerd clients to support such functionality a safe mechanism to duplicate a filesystem trees is required.Safe here means at least:
Since continuity can safely and fully represent a filesystem tree extending it to allow duplicating that tree seems reasonable.
Proposal
I propose adding a single method to the existing
Resource
interface:This method will (given
p := resource.Path()
) duplicatesrcroot/p
todstroot/p
, preserving permissions and ownership. If theResource
implements theXattrer
interface thenxattrs
will also be preserved. Likewise if theResource
implementsHardlinkable
then all hardlinks will be created. (similarly for any other interfaces I've missed or which are added in the future)In normal expected usage
srcroot
would be the root of aContext
. Therefore a new method will also be added to theContext
interfaces:This will
BuildManifest(self)
, sort the result usingByPath
and then iterate callingresource.Copy(self.path, dstroot)
for each resource.Variations
Copying
Xattrer
andHardlinkable
as responsiblity of callerThe support for
Xattrer
andHardlinkable
in each implementation ofResource
seems likely to be quite similar, which likely implies the addition of a helper for each.We could therefore consider excluding handling of those from the
Resource.Copy
method and instead push the responsibility back to the callers (so,Context.Copy
here) using those same helpers.Perhaps:
I lean towards requiring
Resource.Copy
to completely copy all known properties since this seems simpler. The downside is that callers cannot then opt out of copying certain properties.Copyable
interfaceRather than adding a new method to
Resource
we could add a new interface:However since all Resources would have to support the method in order for the overall functionality to be reliably usable there doesn't seem much point.