NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.07k stars 14.05k forks source link

dockerTools.buildImage: schema upgrade #75275

Open tomberek opened 4 years ago

tomberek commented 4 years ago

Describe the bug Docker is deprecating their support of schema 2v1. https://docs.docker.com/engine/deprecated/#pushing-and-pulling-with-image-manifest-v2-schema-1

To Reproduce This is noted already at: https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/docker/default.nix#L372

Expected behavior We should update the generation to produce the newer schema.

I've tested various upgrades to the dockerTools library where we expose the per-layer representation. This cuts down on repeated tar/pigz compress/decompress cycles. Large images are much faster to manage and iterate on with this style.

FRidh commented 4 years ago

It's targeted for removal in Engine 20.03. We definitely need to fix this or dockerTools becomes rather useless.

grahamc commented 4 years ago

Not a blocker for release. If we get an update to master fixing this issue, we can backport without trouble.

nixos-discourse commented 4 years ago

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/go-no-go-meeting-nixos-20-03-markhor/6495/16

stale[bot] commented 4 years ago

Hello, I'm a bot and I thank you in the name of the community for opening this issue.

To help our human contributors focus on the most-relevant reports, I check up on old issues to see if they're still relevant. This issue has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

The community would appreciate your effort in checking if the issue is still valid. If it isn't, please close it.

If the issue persists, and you'd like to remove the stale label, you simply need to leave a comment. Your comment can be as simple as "still important to me". If you'd like it to get more attention, you can ask for help by searching for maintainers and people that previously touched related code and @ mention them in a comment. You can use Git blame or GitHub's web interface on the relevant files to find them.

Lastly, you can always ask for help at our Discourse Forum or at #nixos' IRC channel.

glittershark commented 4 years ago

this is still important to me

felschr commented 3 years ago

Google Container Registry started giving me received unexpected HTTP status: 500 Internal Server Errors when pushing images built using dockerTools. I'm not entirely sure that it's caused by the old schema, but it seems rather likely.

felschr commented 3 years ago

I reported this on the Google Issue Tracker as well but no response so far: https://issuetracker.google.com/issues/176921663

felschr commented 3 years ago

While GCR isn't giving us any 500 errors anymore it seems that the images are either broken or that kubernetes doesn't support them anymore as our k8s services are failing with ErrImagePull:

Failed to pull image "eu.gcr.io/[project]/[image]:[tag]":  rpc error: code = Unknown desc = failed to pull and unpack image  "eu.gcr.io/[project]/[image]:[tag]": failed to unmarshal  image from schema 1 history: json: cannot unmarshal string into Go  struct field ImageConfig.config.Entrypoint of type []string 
mbr commented 3 years ago

We ran into this bug as well. Situation:

  1. Ran a cluster on an older kubernetes version (hosted by digital ocean, 1.19?).
  2. Upgrade major versions three times, now at version
  3. All (!) pods based on dockerTools-created images fail to start with the following error:
    rpc error: code = Unknown desc = failed to pull and unpack image "[ourregistry]/[ourimage]:latest": unmarshal image config: json: cannot unmarshal string into Go struct field ImageConfig.config.Entrypoint of type []string

(We use harbour as a registry).

We observe the following issues:

Docker version used for local testing: Docker version 19.03.12, build v19.03.12 Podman version used for local testing: podman version 3.0.1

Our whole production system is now dead in the water :( My current guess is that

a. dockerTools produces images that cannot be run by podman, only docker, and b. kubelet runs containers via podman instead of docker, at least on DOKS

This issue seems to suspect that it is a schema version, I am unsure at this point. We can run docker manifest inspect ourregistry.com/ourimage:latest:

{
    "schemaVersion": 2,
    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
    "config": {
        "mediaType": "application/vnd.docker.container.image.v1+json",
        "size": 601,
        "digest": "sha256:7e289c0da1cbae51ba2d3b18c4f5fddf070e7da1ba28bec0d3c31d12a31b8ca9"
    },
    "layers": [
        {
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 284587299,
            "digest": "sha256:14c04414b7115b4d6c669beb8784141e4fa0734dec4c3fdeed73077f1b596b23"
        }
    ]
}

It does seem to be version 2.

@felschr Did you manage to solve this?

Edit:

From the Digital Ocean Kubernetes Changelog:

Starting with DigitalOcean Kubernetes 1.20, containerd is used as the container runtime. Previous releases used Docker. This change reflects the upstream deprecation of dockershim in Kubernetes 1.20.

This seems to be in line with the observed issues.

tomberek commented 3 years ago

This would require some care and testing with various registries to ensure nothing breaks. Can you get an idea of what the JSON is that podman receives?

Trying to pull ourregistry.com/ourimage:latest...
  json: cannot unmarshal string into Go struct field ImageConfig.config.Entrypoint of type []string
Error: Error parsing image configuration: json: cannot unmarshal string into Go struct field ImageConfig.config.Entrypoint of type []string

That would help confirm and pinpoint the issue.

mbr commented 3 years ago

This would require some care and testing with various registries to ensure nothing breaks. Can you get an idea of what the JSON is that podman receives?

I tried by observing the logs from podman --log-level=debug pull ...:

time="2021-08-23T11:13:43+02:00" level=debug msg="GET https://ourregistry.com/v2/"
time="2021-08-23T11:13:44+02:00" level=debug msg="Ping https://ourregistry.com/v2/ status 401"
time="2021-08-23T11:13:44+02:00" level=debug msg="GET https://ourregistry.com/service/token?<redacted>"
time="2021-08-23T11:13:44+02:00" level=debug msg="GET https://ourregistry.com/v2/project/pdf-service/manifests/latest"
time="2021-08-23T11:13:44+02:00" level=debug msg="Content-Type from manifest GET is \"application/vnd.docker.distribution.manifest.v2+json\""
time="2021-08-23T11:13:44+02:00" level=debug msg="Using blob info cache at /home/marc/.local/share/containers/storage/cache/blob-info-cache-v1.boltdb"
time="2021-08-23T11:13:44+02:00" level=debug msg="IsRunningImageAllowed for image docker:ourregistry.com/project/pdf-service:latest"
time="2021-08-23T11:13:44+02:00" level=debug msg=" Using default policy section"
time="2021-08-23T11:13:44+02:00" level=debug msg=" Requirement 0: allowed"
time="2021-08-23T11:13:44+02:00" level=debug msg="Overall: allowed"
time="2021-08-23T11:13:44+02:00" level=debug msg="Downloading /v2/project/pdf-service/blobs/sha256:7e289c0da1cbae51ba2d3b18c4f5fddf070e7da1ba28bec0d3c31d12a31b8ca9"
time="2021-08-23T11:13:44+02:00" level=debug msg="GET https://ourregistry.com/v2/project/pdf-service/blobs/sha256:7e289c0da1cbae51ba2d3b18c4f5fddf070e7da1ba28bec0d3c31d12a31b8ca9"
time="2021-08-23T11:13:44+02:00" level=debug msg="Error pulling image ref //ourregistry.com/project/pdf-service:latest: Error parsing image configuration: json: cannot unmarshal string into Go struct field ImageConfig.config.Entrypoint of type []string"
  json: cannot unmarshal string into Go struct field ImageConfig.config.Entrypoint of type []string
Error: Error parsing image configuration: json: cannot unmarshal string into Go struct field ImageConfig.config.Entrypoint of type []string

After tinkering with the login for a bit, this is the last request received:

$ curl -vu 'username:password' 'https://ourregistry.com/v2/project/pdf-service/blobs/sha256:7e289c0da1cbae51ba2d3b18c4f5fddf070e7da1ba28bec0d3c31d12a31b8ca9'

{
  "architecture": "amd64",
  "config": {
    "Entrypoint": "/nix/store/f0wxd5d8ydsdrrscnd5yz6axkwfk71b4-pdf-service/bin/pdf-service",
    "Env": [
      "TMPDIR=/",
      "LATEXMK=/nix/store/dlqzpjxb8yh9pw1hzwbvpsw7gzysz1fb-texlive-combined-2019/bin/latexmk"
    ],
    "ExposedPorts": {
      "8080/tcp": {}
    }
  },
  "created": "1970-01-01T00:00:01Z",
  "os": "linux",
  "rootfs": {
    "diff_ids": [
      "sha256:967c32a61522019bc7cc3e4b917e920c9052e192cde8e8d09a9e3533cfff3cf7"
    ],
    "type": "layers"
  },
  "history": [
    {
      "created": "1970-01-01T00:00:01Z"
    }
  ]
}

This looks like the JSON file that is part of the container.

tomberek commented 3 years ago

Well, I presume that is wrong right there. The entrypoint is not a list: https://pkg.go.dev/github.com/containers/podman/v2/libpod#ContainerImageConfig

mbr commented 3 years ago

Well, I presume that is wrong right there. The entrypoint is not a list:

That turned out to be correct, thank you!

docker will happily accept a string as an entrypoint, but podman insists on a list at the same point. Changing the Entrypoint from "foo" to [ "foo" ] worked like a charm.

tomberek commented 3 years ago

Well, I presume that is wrong right there. The entrypoint is not a list:

That turned out to be correct, thank you!

docker will happily accept a string as an entrypoint, but podman insists on a list at the same point. Changing the Entrypoint from "foo" to [ "foo" ] worked like a charm.

Would you mind a chat about your use-case and Docker support in general?

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

sanmai-NL commented 10 months ago

@tomberek is this issue stale?