coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
260 stars 60 forks source link

docker run fails with containerd-1.6.23-1 #1578

Closed fifofonix closed 7 months ago

fifofonix commented 9 months ago

Describe the bug

As detailed here basic docker run commands fail on task creation.

https://bugzilla.redhat.com/show_bug.cgi?id=2239849

Reproduction steps

  1. Commission new FCOS next node
  2. docker run -d nginx (Errors)
  3. docker ps shows no running process

Expected behavior

Nginx container launches as evidenced in a docker ps command.

Actual behavior

Docker run fails on task creation. No running container.

System details

Butane or Ignition config

No response

Additional information

Ignition has some proxy declaration but that's it.

No crypto modifications or changes to docker settings etc.

travier commented 9 months ago

We would really benefit from having Docker tests.

travier commented 9 months ago

Made a very basic test in https://github.com/coreos/fedora-coreos-config/pull/2622

jlebon commented 9 months ago

For reference, the error is:

[root@cosa-devsh ~]# docker run -d nginx
Unable to find image 'nginx:latest' locally
latest: Pulling from library/nginx
360eba32fa65: Pull complete
c5903f3678a7: Pull complete
27e923fb52d3: Pull complete
72de7d1ce3a4: Pull complete
94f34d60e454: Pull complete
e42dcfe1730b: Pull complete
907d1bb4e931: Pull complete
Digest: sha256:112b224f9d7f74ac22211c70108f4328c80be7eb2768f7ee6ace6b120fbaf593
Status: Downloaded newer image for nginx:latest
b3801713762f7e61151fc0a96ed919ca8ee1661df6822d2a80117e21ba69307d
docker: Error response from daemon: failed to create task for container: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown.
jlebon commented 9 months ago

Also appears to be an issue in Fedora Cloud at least: https://bugzilla.redhat.com/show_bug.cgi?id=2239849#c1

travier commented 9 months ago

From today's meeting:

  * AGREED: We'll pause the rollout of the next stream due to
    https://github.com/coreos/fedora-coreos-tracker/issues/1578
    (travier, 16:51:35)
basvdlei commented 9 months ago

My initial testing of Fedora CoreOS 39.20230916.1.1 also failed with a similar error in the kubelet and containerd. Note this a kubelet configured to use containerd as it's runtime --container-runtime-endpoint=unix:///run/containerd/containerd.sock.

containerd error:

containerd[1724]: time="2023-09-21T09:58:23.625852918Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-192.168.121.11,Uid:97fc473b4e71c3ac01e7b0468cde9a25,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown"

kubelet error:

kubelet[2665]: E0921 09:58:23.629804    2667 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown"
fifofonix commented 9 months ago

My initial testing of Fedora CoreOS 39.20230916.1.1 also failed with a similar error in the kubelet and containerd. Note this a kubelet configured to use containerd as it's runtime --container-runtime-endpoint=unix:///run/containerd/containerd.sock.

containerd error:

containerd[1724]: time="2023-09-21T09:58:23.625852918Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-192.168.121.11,Uid:97fc473b4e71c3ac01e7b0468cde9a25,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown"

kubelet error:

kubelet[2665]: E0921 09:58:23.629804    2667 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown"

Thanks for this. Saves me testing this on our next k8s cluster this morning which was on my to-do list.

basvdlei commented 9 months ago

It seems the containerd package is broken. A clean instance with docker completely disabled can't run a container in containerd with it's own ctr client.

Click here for the reproduction Ignition file: ``` variant: fcos version: 1.1.0 passwd: users: - name: core ssh_authorized_keys: - xxx systemd: units: - name: docker.service mask: true - name: docker.socket mask: true - name: containerd.service enabled: true ``` Trying to start a container with `ctr`: ``` [core@localhost ~]$ rpm -q --file /usr/bin/ctr containerd-1.6.23-1.fc39.x86_64 [core@localhost ~]$ sudo /usr/bin/ctr image pull docker.io/library/hello-world:latest docker.io/library/hello-world:latest: resolved |++++++++++++++++++++++++++++++++++++++| index-sha256:4f53e2564790c8e7856ec08e384732aa38dc43c52f02952483e3f003afbf23db: done |++++++++++++++++++++++++++++++++++++++| manifest-sha256:7e9b6e7ba2842c91cf49f3e214d04a7a496f8214356f41d81a6e6dcad11f11e3: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:719385e32844401d57ecfd3eacab360bf551a1491c05b85806ed8f1b08d792f6: done |++++++++++++++++++++++++++++++++++++++| config-sha256:9c7a54a9a43cca047013b82af109fe963fde787f63f9e016fdc3384500c2823d: done |++++++++++++++++++++++++++++++++++++++| elapsed: 2.3 s total: 1.9 Ki (861.0 B/s) unpacking linux/amd64 sha256:4f53e2564790c8e7856ec08e384732aa38dc43c52f02952483e3f003afbf23db... done: 49.348039ms [core@localhost ~]$ sudo /usr/bin/ctr run docker.io/library/hello-world:latest hello ctr: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown ```
dustymabe commented 9 months ago

I was able to downgrade to containerd-1.6.19-1.fc39 and things seem to work. I'll open a PR to revert to this version of containerd while investigation takes place on the new version.

dustymabe commented 9 months ago

The fix for this went into next stream release 39.20230916.1.2. Please try out the new release and report issues.

travier commented 9 months ago

Re-opened as this is not truly "fixed". https://github.com/coreos/fedora-coreos-config/pull/2625 is a temporary workaround.

dustymabe commented 9 months ago

New BZ to follow (the other got closed out as duplicate): https://bugzilla.redhat.com/show_bug.cgi?id=2237396

stefangweichinger commented 8 months ago

I also have this issue with F39 workstation beta.

Downgrading helps:

sudo dnf install ~/Downloads/containerd-1.6.19-2.fc39.x86_64.rpm
dustymabe commented 8 months ago

Hi @stefangweichinger this is an issue tracker specific to Fedora CoreOS.

For fedora in general we are following the bug at https://bugzilla.redhat.com/show_bug.cgi?id=2237396. Can you add your comment there?

I'll mark these two comments as off-topic since this tracker is Fedora CoreOS specific.

stefangweichinger commented 8 months ago

Hi @stefangweichinger this is an issue tracker specific to Fedora CoreOS.

For fedora in general we are following the bug at https://bugzilla.redhat.com/show_bug.cgi?id=2237396. Can you add your comment there?

I'll mark these two comments as off-topic since this tracker is Fedora CoreOS specific.

Maybe you have to mark this one as well ... ? Thanks for the correction and the link, I posted there now.

webs397 commented 7 months ago

This seem error seems to be present again with Fedora 39. Unable to open any docker containers that worked before updating. Otherwise nothing changed with containers. Example output: docker start mynodered Error response from daemon: failed to create task for container: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown

dustymabe commented 7 months ago

@webs397 are you running Fedora CoreOS? If so what version of Fedora CoreOS are you using (show the output of rpm-ostree status). If not, you need to comment on and follow https://bugzilla.redhat.com/show_bug.cgi?id=2237396

webs397 commented 7 months ago

@dustymabe Oh, no I am on Fedora Workstation 39. Thanks for the response I will check it out!

wioch commented 7 months ago

I'm running couple of Fedora Servers 39. With:

dustymabe commented 7 months ago

If you're not running Fedora CoreOS, please follow https://bugzilla.redhat.com/show_bug.cgi?id=2237396

sustmi commented 7 months ago

I confirm that downgrading containerd-1.6.23-1.fc39.x86_64 to containerd-1.6.19-2.fc39.x86_64 from https://koji.fedoraproject.org/koji/buildinfo?buildID=2236784 and running systemctl restart docker fixed the:

docker: Error response from daemon: failed to create task for container: failed to create shim task: ttrpc: cannot marshal unknown type: *task.CreateTaskRequest: unknown.

error on my Fedora 39 Workstation.

EDIT: Sorry, just now I realized that I duplicated existing comment that was marked as off-topic. I found this issue as first result in Google. I did not realize what repository it this. :pensive:

dustymabe commented 7 months ago

Locking this issue for now to prevent new off-topic comments. Please follow https://bugzilla.redhat.com/show_bug.cgi?id=2237396

dustymabe commented 7 months ago

There is a proposed update to fix this problem in Fedora 39. I have opened a fast-track PR to get it into testing-devel.