Add `duffle rewrite` command

technosophos commented 6 years ago

This command is the first step toward an export/import workflow. I think we should implement it as a stand-alone because we don't necessarily always want to export when we rewrite.

Here's an example of what I imagine the command would look like:

$ duffle rewrite -f path/to/bundle.json technosophos.azurecr.io/foo

Here's an example source bundle.json:

{
    "name": "hellohelm",
    "version": "0.1.0",
    "invocationImage": {
        "imageType": "docker",
        "image": "cnab/lakebase:latest"
    },
    "images": [
        {
            "name": "alpine",
            "uri": "technosophos/demo2alpine:0.1.0",
            "refs": [
                {
                    "path": "cnab/app/charts/alpine/values.yaml",
                    "field": "image.repository"
                }
            ]
        },
               {
            "name": "alpine",
            "uri": "quay.io/technosophos/something:0.2.0",
            "refs": [
                {
                    "path": "cnab/app/charts/alpine/values.yaml",
                    "field": "image.deployment"
                }
            ]
        }
    ],
    "parameters": {},
    "credentials": {}
}

Okay, my interpretation of the original proposal is that running hte above command on the above bundle.json should do the following:

The arg[1] should be interpreted as "rewrite the repo half of a a docker image to technosophos.azurecr.io/foo'
The invocation image should be rewritten to technosophos.azurecr.io/foo/lakebase:latest
The first and second image should be rewritten to:
- technosophos.azurecr.io/foo/demo2alpine:0.1.0
technosophos.azurecr.io/foo/something:0.2.0
The file cnab/app/charts/alpine/values.yaml should be opened and following two things should be replaced:
- image.deployment should be set to technosophos.azurecr.io/foo/something:0.2.0
- image.repository should be set totechnosophos.azurecr.io/foo/demo2alpine:0.1.0`

So the resulting bundle.json should be:

{
    "name": "hellohelm",
    "version": "0.1.0",
    "invocationImage": {
        "imageType": "docker",
        "image": "technosophos.azurecr.io/foo/lakebase:latest"
    },
    "images": [
        {
            "name": "alpine",
            "uri": "technosophos.azurecr.io/foo/demo2alpine:0.1.0",
            "refs": [
                {
                    "path": "cnab/app/charts/alpine/values.yaml",
                    "field": "image.repository"
                }
            ]
        },
               {
            "name": "alpine",
            "uri": "technosophos.azurecr.io/foo/something:0.2.0",
            "refs": [
                {
                    "path": "cnab/app/charts/alpine/values.yaml",
                    "field": "image.deployment"
                }
            ]
        }
    ],
    "parameters": {},
    "credentials": {}
}

And the resulting cnab/app/charts/alpine/values.yaml should be:

# Original stuff...

images:
  repository: technosophos.azurecr.io/foo/demo2alpine:0.1.0
  deployment: technosophos.azurecr.io/foo/something:0.2.0

# Other stuff from original values.yaml

We will need to decide in the future whose job it is to push the old images to the new location.

radu-matei commented 6 years ago

This looks like a great start - how about the tag name for this command? It follows the Docker analogy, plus it is a bit easier to write.

jeremyrickard commented 6 years ago

@technosophos when I read through this, I made the assumption that someone would have done a duffle pull before running this so they would have the CNAB and bundle.json locally. Is that valid?

technosophos commented 6 years ago

I like duffle tag instead of rewrite.

@jeremyrickard Yeah. You should be able to safely assume that:

The CNAB bundle is local
The Docker images are all available in the Docker daemon that the host currently points to

If those are not true, it is fair to error out.

jeremyrickard commented 6 years ago

+1 to duffle tag

itowlson commented 6 years ago

duffle tag to me implies adding or changing a tag on the bundle, not modifying the bundle to point to a different repo. Subjective though I know...!

bacongobbler commented 6 years ago

To me, if I intend to re-tag a "thin" bundle via duffle tag, I expect the name and tag of the bundle to change, but not the parameters or underlying image tags. In other words, I only expect the name and version fields of the bundle.json to be changed and nothing else.

I envisioned duffle tag to be incredibly similar to docker tag:

docker build -t foo:latest
docker tag foo quay.io/bar:canary
docker push quay.io/bar:canary

The implication of the above use case is that I have not modified the underlying image layers (or the bundle, in our case), only the name and tag of the image (bundle).

That being said, we've been tossing around the concept of a "thick" bundle. A "thick" bundle is a bundle.json that defines the bundle and references the invocation images not by image name (technosophos/helloworld), but by the image layers that actually make up technosophos/helloworld.

Once a "thick" bundle has been re-tagged, duffle push can (and will) push the bundle along with all of its image layers in one go to the targeted registry, capturing about 90% of the same use case @jeremyrickard showed in standup. Once we actually go and implement the darned thing, we may want to consider making "thick bundles" the default output of duffle build to handle this use case. :)

The last bit @jeremyrickard demonstrated in standup today (re-tagging the refs field) would also likely go away if we switch to thick bundles by default.

We don't have a concrete example of a "thick" bundle to show yet, but I will update #215 with an example on how the bundle.json would look. Would love to pair with you on this one @jeremyrickard!

jeremyrickard commented 6 years ago

I envisioned duffle tag to be incredibly similar to docker tag:

@carolynvs and I were discussing this today and she had the same immediate thought I believe.

This morning I also tried to do an end-to-end sort of test with this and I think that how this issue was written (and how I implemented the command) will probably lead to some confusion for people.

Using hellohelm as an example:

It's bundle.json is currently:

{
    "name": "hellohelm",
    "version": "0.1.0",
    "invocationImage": {
        "imageType": "docker",
        "image": "cnab/hellohelm:latest"
    },
    "images": [
        {
            "name": "alpine",
            "uri": "technosophos/demo2alpine:0.1.0",
            "refs": [
                {
                    "path": "cnab/app/charts/alpine/values.yaml",
                    "field": "image.repository"
                }
            ]
        }
    ]
    ....some stuff omitted...

If we use the command spec for this issue and runs something like duffle tag -f hellohelm/bundle.json jeremyrickard that would get re-written to:

{
    "name": "hellohelm",
    "version": "0.1.0",
    "invocationImage": {
        "imageType": "docker",
        "image": "jeremyrickard/hellohelm:latest"
    },
    "images": [
        {
            "name": "alpine",
            "uri": "jeremyrickard/demo2alpine:0.1.0",
            "refs": [
                {
                    "path": "cnab/app/charts/alpine/values.yaml",
                    "field": "image.repository"
                }
            ]
        }
    ]
...stuff omitted

The values.yaml, however, is structured like:

image:
  pullPolicy: IfNotPresent
  repository: jeremyrickard/demo2alpine:0.1.0
  tag: 3.3
restartPolicy: Never

The difference here from the example given above is that we have the repository and tag attributes broken out, so running the tag command on this actually results in a broken end result. Doing the replace like that in the referenced things implies some structure that I'm not sure we've constrained. Rewriting the repository for the invocation image makes more sense, I think, but updating the references for something like this case seems not the best?

Maybe we can discuss at the standup or have a separate chat about this?

carolynvs commented 6 years ago

My immediate confusion was around the word "tag" because with docker, tag refers to the IMAGE:TAG format. I thought that I was going to be retag ponysparkles:abc123 to ponysparkles:v1.1, not just change the repository.

I may be missing the underlying use case for when someone would want to use this functionality. Naively, if I am only changing where it "lives", but not the underlying content, it almost sounds like a fork. Not that we need to introduce anymore confusing analogs though! 😀

jeremyrickard commented 6 years ago

OK! @bacongobbler and I had a little chat on this and I'll summarize below.

Point the first: There are probably two conceptual use cases here:

1.) "Rewriting" the image references in the bundle (i.e. invocation image and images array). We should probably not use "tag" here because semantically it isn't the same as the docker command.

2.) "Tag" a bundle, which would change it's name (note, i'm not suggesting it should work like below, just a notional example).

$ duffle search
NAME                REPOSITORY                  VERSION
hellohelm           github.com/deis/bundles.git 0.1.0
$ duffle tag hellohelm:0.1.0 jeremyhelm:1.0

This probably just updates the name and version tags in the bundle.

Point the 2nd: We think that rewrite makes sense for thin bundles, but probably not once we talk about thick bundles. While discussing this, we wondered if possibly thick bundles should be the default, with the related question: for the import/export case, should a thin bundle get converted to a thick bundle?

Point the 3rd: The rewrite command as a standalone command has two issues that we should consider:

Currently, the refs array for each image in the images array can point to an arbitrary file. In the cases here, it's a yaml file, but it could be a JSON file (consider the case of ARM templates) or something else. Currently, this field isn't actually used anywhere, but when doing a rewrite on the hellohelm example, we end up in a bad state because the bundle.json uses a reference with the field "image.repository" in the associated values.yaml for the chart, however, the values.yaml expects that to only be the repository portion, not the image with the repository portion rewritten (i.e. it expects alpine, not jeremyrickard/alpine:3.3)
- Assuming it's successfully retagged the images locally, there still needs to be a step that pushes those to a remote location. duffle doesn't have an associated command to push all the references docker images to a repository. I'd suggest if we're going to rewrite the repositories for a thin bundle, perhaps this command should also do a docker push after it's done a docker tag for the user. Otherwise, I'm not able to do duffle rewrite followed by a duffle installto something remote like Kubernetes because it's going to reference non-existant images.

So in summary I think the action items are:

1.) We should probably change this back to rewrite instead of tag 2.) We should create a tag command that just changes the name and version in the bundle 3.) We should determine if the rewrite command should push images after they've been rewritten and if this should only apply to thin bundles, and if so, maybe it doesn't need to exist at all after thick bundles are a thing?

@bacongobbler feel free to add any more around this!

cc: @technosophos @itowlson @carolynvs @radu-matei

bacongobbler commented 6 years ago

No, this summarized our discussion perfectly. Thanks @jeremyrickard for the thorough write-up!

To expand a little bit on one point:

for the import/export case, should a thin bundle get converted to a thick bundle?

From early discussions with @jeremyrickard, this definitely seemed like something we should consider. Imagine the following use case:

I, as a developer, want to export my thin bundle from my desktop, throw it on a USB stick, and import it on my laptop/cloud/IoT-enabled cat.

Now, imagine the process to export/import a bundle onto your IoT-enabled cat takes several days. When we export a thin bundle, we only have the image references, so we pull down the image layers and export it into the bundle. When we re-import it, we probably want to reference the layers placed on the USB stick rather than what's currently sitting in technosophos/helloworld, right? In the process it took to export/import, technosophos/helloworld may have been updated, which means that we'd miss out on the basic use case of export/import if we re-convert back to a thin bundle. In that case, we'd need to convert the manifest to a thick bundle to reference the local layers.

That also brings up another question: should we consider making thick bundles (once they're actually a thing and not some concept in our heads :smile:) the default output of duffle build? duffle rewrite may no longer be required if references to image layers stick with the bundle throughout its lifecycle. The rest of the infrastructure isn't ready to support thick bundles yet, but it does pose an interesting question in regards to the feasibility of this feature. It'd make duffle pull and duffle push much slower (because it needs to fetch all the layers), but duffle install et al would be significantly faster since all the layers would be on-disk, making for a speedy runtime experience.

Probably starting to veer a little off-topic from the current proposal (duffle rewrite/duffle tag), so please let me know if I should take this discussion elsewhere.

technosophos commented 6 years ago

Okay... wow... there is a ton to cover on this thread. Alright, I will try to articulate how this plays out in my head. then we can keep discussing.

duffle tag: So if I understand tag correctly, the suggestion is to just turn that into a versioning tool for the bundle itself? (e.g. make it do what docker tag does) That sounds fine with me. In this case, we're just moving the bundle name/version and the invocation image name/version.
duffle rewrite: My understanding is that you are thinking that this would be the command to do the tag logic plus the modifications to the images: [] and the respectively referenced files. That sounds fine to me. That name is a little more clear.
duffle export/duffle import: I am sorry that I didn't articulate this case more clearly before. Here's what the original proposal suggested:
- duffle export takes the bundle.json and all of the images and writes them into a single large file -- a thick bundle. duffle export is never used to produce thin bundles. We could allow it to rewrite names, but I don't know why we would do that.
- duffle import takes a mandatory argument that tells the importer how to name the imported images. That part is essentially the same logic as we would do in duffle rewrite, but only as part of a larger process that would unpack the images and then push them all. (We are importing into the local Docker daemon)
  - import ALWAYS uses the binaries it reads from the full bundle, and NEVER fetches those from the network. The name field here is only used to correlate between the images in the bundle.json and the images in the tar file. For example, in the tar file, the filename field will likely be the image name (e.g. foo/bar:latest will likely be the tar entry named foo/bar:latest).
  - The purpose of forcing the rename is to indicate that the provence between the original upstream (e.g. technosophos/hellohelm) is now lost. Essentially, exporting and importing is forking. The burden of re-aligning onto the upstream technosophos/helm is now a burden placed on the operator.
  - For this reason, we could break with the original spec and do the rename on duffle export and that would probably preserve the intention to fork. Then the rename on import would be optional. However, that seems more complex, since it's really during the import that we know what we want our destination to be.
- duffle import may have a --push-images flag to push those images into a registry, but this is not the default. In the IoT case, there may not be a registry. The master node on the device tree will be doing all of the work locally in its moby instance.
- in the original spec, there was no tag or rewrite command. I added that because I think this logic is useful for the thin case. So in my mind (and I did not do a good job of communicating this), duffle rewrite is really only for thin images.
The spec requires that the rewriting patterns work on JSON, YAML, and XML: https://github.com/deis/duffle/blob/master/docs/101-bundle-json.md#the-image-list (why those three? Kubernetes, ARM, and Service Fabric).
- The replacement logic should be sophisticated enough that it can read an image field and parse it the way Docker does (e.g. can handle image, image:tag, org/image, org/image:tag, server/org/image, server/org/image:tag...)
- When it comes to pushing these images, we can go either way on duffle rewrite: We can inform devs that they need to push their own images (in which case, we should probably provide a tool). That is what Docker does. Or we could push for them.
- Keep in mind that the images array may also be used to store VM images, which will have a very different set of rules which we have not yet discovered/invented.
No, we should not make thick bundles the default. They will be the 10% use case for Duffle (if even that). And they will be huge in most cases. And ironically, install time is not necessarily what we are optimizing for, since most cloud services already take a long time to standup.

In summary:

Yes, split tag and rewrite
rewrite should operate on thin bundles
We should probably add some sort of duffle push-images command or add that as a flag to rewrite and import (but do not push by by default)
Think of duffle export and duffle import as a forking strategy where the exporter is deliberately separating the package from its upstream.
We should not build full bundles by default, as that is not our anticipated usage pattern

Phew... I need a donut and a coffee now (in that order).

jeremyrickard commented 6 years ago

@technosophos, wanted to follow up on this on hopefully a last point of clarification.

From your clarification above:

The replacement logic should be sophisticated enough that it can read an image field and parse it the way Docker does (e.g. can handle image, image:tag, org/image, org/image:tag, server/org/image, server/org/image:tag...)

And from the spec:

TODO: How do we specify multiple replacements within a single file?

If I have an input, like the hellohelm example:

image:
  pullPolicy: IfNotPresent
  repository: alpine
  tag: 3.3
restartPolicy: Never

What would that file look like if I run rewrite like:

duffle rewrite -f ~/cnab-bundles/hellohelm/bundle.json jeremyrickard

Currently, the bundle.json in that example is:

"images": [
        {
            "name": "alpine",
            "uri": "technosophos/demo2alpine:0.1.0",
            "refs": [
                {
                    "path": "cnab/app/charts/alpine/values.yaml",
                    "field": "image.repository"
                }
            ]
        }
    ],

With the bundle.json as-is, I'd expect to only replace image.repostory with something like jeremyrickard/demo2alpine, but the image.tag would be unchanged (and incorrect in this case).

If I add I add another ref to the same file, is that valid? i.e..

{
      "path": "cnab/app/charts/alpine/values.yaml",
      "field": "image.tag"
}

Assuming that's valid, it would get updated as well, but the value there should really be 0.1.0. How do we identify that correctly? AFAICT from reading the Docker image spec, 3.3 would be valid image name as well (experimentation shows it is).

docker build -t 3.3 .
Sending build context to Docker daemon   12.8kB
Step 1/3 : FROM cnab/k8sbase:latest
 ---> 0c40883b7dd7
Step 2/3 : COPY app/Makefile /cnab/app/Makefile
 ---> Using cache
 ---> 211940a28786
Step 3/3 : COPY app/charts /cnab/app/charts
 ---> Using cache
 ---> 425c9860d53c
Successfully built 425c9860d53c
Successfully tagged 3.3:latest

Or is that example not valid and the image.tag should reflect the value already in the original bundle.json (which was "uri": "technosophos/demo2alpine:0.1.0") and be set to 0.1.0?

Thanks!

technosophos commented 6 years ago

That is an excellent question. And this is going to be a huge problem if we don't do a good job of it.

What if we did something like allow an optional template field that had support for {{ .Repo }}, {{ .Image }}, and {{ .Tag }}? Do you think that would be flexible enough?

glyn commented 5 years ago

This issue corresponds to what we in the riff project call image relocation. For background, please see the design document Private Docker Registries and riff, the riff issues labelled with private-registry, and the corresponding PRs.

In the context of CNAB, we see two possible approaches to image relocation:

Make image relocation the responsibility of an invocation image (probably packaged in a base invocation image and augmented with tooling). A parameter would specify the target registry and the invocation image would be responsible for tagging and pushing images to the target registry and rewriting kubernetes configuration files in the bundle to refer to the relocated image names. This approach puts the least burden on the CNAB spec and CNAB runtimes.
Make image relocation the responsibility of the CNAB runtime and pass a map from original image name to relocated image name to the invocation image so that it can rewrite kubernetes configuration files in the bundle to refer to the relocated image names. This approach puts the least burden on invocation images and tooling.

Image relocation could apply to thick bundles (probably most suited to an air gapped installation) or thin bundles (where the user has a network zone with read access to the internet and read-write access to the target registry).

[The above information refers to open source versions of riff-distro and riff image commands, but these have been moved to closed source. See the PFS documentation for usage information.]

simonferquel commented 5 years ago

Please note that we already implemented that in cnab-to-oci (see https://github.com/docker/cnab-to-oci/blob/master/remotes/fixup.go#L18).

glyn commented 5 years ago

@simonferquel Thanks for that. I couldn't find the code to update refs to images during fixup. Is that supported?

glyn commented 5 years ago

https://github.com/deislabs/duffle/issues/668 aims to add image relocation support.

simonferquel commented 5 years ago

@glyn https://github.com/docker/cnab-to-oci/blob/master/remotes/fixup.go < this is where it happens (it does multiple things:

for each image / invocation image,
it copies/mount manifests and blobs to the target repository
it modifies the bundle structure to point to the new reference

At the end of the process, the push logic converts the bundle into an OCI index representation and pushed it to the registry

The pull logic parses this OCI index into a duffle bundle structure with all image references being digested and pointing to the same repository

jeremyrickard commented 5 years ago

I believe this should be closed in support of image relocation should it not?

cnabio / duffle

Add `duffle rewrite` command #217