firecracker-microvm / firecracker-containerd

firecracker-containerd enables containerd to manage containers as Firecracker microVMs
Apache License 2.0
2.15k stars 180 forks source link

failed to dial the VM over vsock #374

Closed bchalios closed 4 years ago

bchalios commented 4 years ago

I am experiencing an error when trying to follow the Getting Started steps from here.

The symptoms of the problem are the following:

I start firecracker-containerd like this:

$ sudo firecracker-containerd --config /etc/firecracker-containerd/config.toml
INFO[2020-01-06T13:25:42.809110705-06:00] starting containerd                           revision= version=1.3.1+unknown
INFO[2020-01-06T13:25:42.852238578-06:00] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2020-01-06T13:25:42.852295150-06:00] loading plugin "io.containerd.snapshotter.v1.devmapper"...  type=io.containerd.snapshotter.v1
INFO[2020-01-06T13:25:42.852348007-06:00] initializing pool device "fc-dev-thinpool"
INFO[2020-01-06T13:25:42.854467085-06:00] using dmsetup:
Library version:   1.02.137 (2016-11-30)
Driver version:    4.35.0
INFO[2020-01-06T13:25:42.898182570-06:00] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2020-01-06T13:25:42.898320055-06:00] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
INFO[2020-01-06T13:25:42.898355955-06:00] metadata content store policy set             policy=shared
INFO[2020-01-06T13:25:42.898500766-06:00] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2020-01-06T13:25:42.898534924-06:00] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2020-01-06T13:25:42.898586864-06:00] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2020-01-06T13:25:42.898614061-06:00] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2020-01-06T13:25:42.898639184-06:00] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2020-01-06T13:25:42.898665799-06:00] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2020-01-06T13:25:42.898691323-06:00] loading plugin "io.containerd.service.v1.leases-service"...  type=io.containerd.service.v1
INFO[2020-01-06T13:25:42.898723173-06:00] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2020-01-06T13:25:42.898747605-06:00] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2020-01-06T13:25:42.898770356-06:00] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2020-01-06T13:25:42.898829218-06:00] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
DEBU[2020-01-06T13:25:42.898898750-06:00] loading tasks in namespace                    namespace=default
DEBU[2020-01-06T13:25:42.898972245-06:00] loading tasks in namespace                    namespace=fc
INFO[2020-01-06T13:25:42.899065812-06:00] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2020-01-06T13:25:42.900674480-06:00] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2020-01-06T13:25:42.900747354-06:00] loading plugin "io.containerd.service.v1.fc-control"...  type=io.containerd.service.v1
DEBU[2020-01-06T13:25:42.900771709-06:00] initializing fc-control plugin (root: "/var/lib/firecracker-containerd/containerd/io.containerd.service.v1.fc-control")
INFO[2020-01-06T13:25:42.901255918-06:00] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2020-01-06T13:25:42.901330072-06:00] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901356617-06:00] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901379681-06:00] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901401891-06:00] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901425364-06:00] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901449594-06:00] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901472131-06:00] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901494624-06:00] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901517491-06:00] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
INFO[2020-01-06T13:25:42.901567900-06:00] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901594498-06:00] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901624916-06:00] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901645016-06:00] loading plugin "io.containerd.grpc.v1.fc-control-service"...  type=io.containerd.grpc.v1
DEBU[2020-01-06T13:25:42.901661400-06:00] initializing fc-control-service plugin
INFO[2020-01-06T13:25:42.901681520-06:00] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2020-01-06T13:25:42.901959154-06:00] serving...                                    address=/run/firecracker-containerd/containerd.sock.ttrpc
INFO[2020-01-06T13:25:42.902018485-06:00] serving...                                    address=/run/firecracker-containerd/containerd.sock
INFO[2020-01-06T13:25:42.902037481-06:00] containerd successfully booted in 0.093713s
DEBU[2020-01-06T13:25:43.030443321-06:00] garbage collected                             d=31.57741ms

then on a different terminal I try to launch a container:

sudo firecracker-ctr --address /run/firecracker-containerd/containerd.sock run --snapshotter devmapper --runtime aws.firecracker --rm --tty  docker.io/library/busybox:latest busybox-test
ctr: aws.firecracker: unexpected error from CreateVM: rpc error: code = Unknown desc = shim CreateVM returned error: rpc error: code = Unknown desc = failed to create VM: failed to dial the VM over vsock: non-temporary vsock dial failure: dial unix firecracker.vsock: connect: connection refused
: exit status 1: unknown

And here are the logs of firecracker-containerd when I try to launch the container:

DEBU[2020-01-06T13:26:19.781674796-06:00] stat snapshot                                 key="sha256:195be5f8be1df6709dafbba7ce48f2eee785ab7775b88e0c115d8205407265c5"
DEBU[2020-01-06T13:26:19.781756046-06:00] stat                                          key="default/2/sha256:195be5f8be1df6709dafbba7ce48f2eee785ab7775b88e0c115d8205407265c5"
DEBU[2020-01-06T13:26:19.822286797-06:00] prepare snapshot                              key=busybox-test parent="sha256:195be5f8be1df6709dafbba7ce48f2eee785ab7775b88e0c115d8205407265c5"
DEBU[2020-01-06T13:26:19.822424120-06:00] prepare                                       key=default/17/busybox-test parent="default/2/sha256:195be5f8be1df6709dafbba7ce48f2eee785ab7775b88e0c115d8205407265c5"
DEBU[2020-01-06T13:26:19.822532808-06:00] creating snapshot device 'fc-dev-thinpool-snap-19' from 'fc-dev-thinpool-snap-1'
DEBU[2020-01-06T13:26:20.416169643-06:00] event published                               ns=default topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2020-01-06T13:26:20.421942607-06:00] get snapshot mounts                           key=busybox-test
DEBU[2020-01-06T13:26:20.422016468-06:00] mounts                                        key=default/17/busybox-test
DEBU[2020-01-06T13:26:20.841094189-06:00] event published                               ns=default topic=/containers/create type=containerd.events.ContainerCreate
DEBU[2020-01-06T13:26:20.866997226-06:00] get snapshot mounts                           key=busybox-test
DEBU[2020-01-06T13:26:20.867063228-06:00] mounts                                        key=default/17/busybox-test
time="2020-01-06T13:26:20.904573963-06:00" level=debug msg=StartShim runtime=aws.firecracker task_id=busybox-test
time="2020-01-06T13:26:20.904780432-06:00" level=info msg="will start a single-task VM since no VMID has been provided" runtime=aws.firecracker task_id=busybox-test vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
DEBU[2020-01-06T13:26:20.905584413-06:00] create VM request: VMID:"35baf542-9a3a-4dc7-838a-4a29493a03ee" ContainerCount:1 ExitAfterAllTasksDeleted:true
DEBU[2020-01-06T13:26:20.905640394-06:00] using namespace: default
DEBU[2020-01-06T13:26:20.905926540-06:00] starting containerd-shim-aws-firecracker      vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:20.955377528-06:00] starting signal loop                          namespace=default path=/var/lib/firecracker-containerd/shim-base/default/35baf542-9a3a-4dc7-838a-4a29493a03ee pid=9103
INFO[2020-01-06T13:26:20.955490640-06:00] creating new VM                               runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.173029544-06:00] Called startVMM(), setting up a VMM on firecracker.sock  runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.185870629-06:00] refreshMachineConfiguration: [GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate:T2 HtEnabled:0xc00041452b MemSizeMib:0xc000414520 VcpuCount:0xc000414518}  runtime=aws.firecracker vmID=35baf542-9a3
a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.186221134-06:00] PutGuestBootSource: [PUT /boot-source][204] putGuestBootSourceNoContent   runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.186243604-06:00] Attaching drive /var/lib/firecracker-containerd/runtime/default-rootfs.img, slot root_drive, root true.  runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.186726743-06:00] Attached drive /var/lib/firecracker-containerd/runtime/default-rootfs.img: [PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent   runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.186748544-06:00] Attaching drive /var/lib/firecracker-containerd/shim-base/default/35baf542-9a3a-4dc7-838a-4a29493a03ee/ctrstub0, slot MN2HE43UOVRDA, root false.  runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.187047353-06:00] Attached drive /var/lib/firecracker-containerd/shim-base/default/35baf542-9a3a-4dc7-838a-4a29493a03ee/ctrstub0: [PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent   runtime=aws.firecracker vmID=35baf542-9a3a
-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.187065576-06:00] Attaching NIC tap0 (hwaddr b2:86:89:8d:5d:36) at index 1  runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.205907619-06:00] startInstance successful: [PUT /actions][204] createSyncActionNoContent   runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
INFO[2020-01-06T13:26:21.205941244-06:00] calling agent                                 runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
DEBU[2020-01-06T13:26:21.757521869-06:00] garbage collected                             d=40.045862ms
INFO[2020-01-06T13:26:22.582939019-06:00] firecracker exited: status=0                  runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
ERRO[2020-01-06T13:26:22.606135657-06:00]                                               attempt=6 error="non-temporary vsock dial failure: dial unix firecracker.vsock: connect: connection refused" runtime=aws.firecracker vmID=35baf542-9a3a-4dc7-838a-4a294
93a03ee
ERRO[2020-01-06T13:26:22.606217649-06:00]                                               error="failed to create VM: failed to dial the VM over vsock: non-temporary vsock dial failure: dial unix firecracker.vsock: connect: connection refused" runtime=aws.f
irecracker vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
ERRO[2020-01-06T13:26:22.606639041-06:00]                                               error="shim CreateVM returned error: rpc error: code = Unknown desc = failed to create VM: failed to dial the VM over vsock: non-temporary vsock dial failure: dial uni
x firecracker.vsock: connect: connection refused"
DEBU[2020-01-06T13:26:22.610914439-06:00] remove snapshot                               key=busybox-test
DEBU[2020-01-06T13:26:22.633826326-06:00] shim has been terminated                      error="signal: killed" vmID=35baf542-9a3a-4dc7-838a-4a29493a03ee
DEBU[2020-01-06T13:26:22.649096299-06:00] event published                               ns=default topic=/snapshot/remove type=containerd.events.SnapshotRemove
DEBU[2020-01-06T13:26:22.674058633-06:00] event published                               ns=default topic=/containers/delete type=containerd.events.ContainerDelete
DEBU[2020-01-06T13:26:23.532245183-06:00] schedule snapshotter cleanup                  snapshotter=devmapper
DEBU[2020-01-06T13:26:23.532351786-06:00] walk
DEBU[2020-01-06T13:26:23.532456222-06:00] remove                                        key=default/17/busybox-test
DEBU[2020-01-06T13:26:23.992151904-06:00] removed snapshot                              key=default/17/busybox-test snapshotter=devmapper
DEBU[2020-01-06T13:26:23.992278555-06:00] snapshot garbage collected                    d=459.943691ms snapshotter=devmapper
DEBU[2020-01-06T13:26:23.992326580-06:00] garbage collected                             d=19.804667ms

My machine has the following specs:

$ uname -a
Linux baremetal.nubificus.com 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11) x86_64 GNU/Linux
$ cat /etc/debian_version
9.9
$ sudo firecracker --version
firecracker 0.19.1
$ sudo firecracker-containerd --version
containerd github.com/containerd/containerd 1.3.1+unknown
$ sudo firecracker-ctr --version
ctr github.com/containerd/containerd 1.3.1+unknown
$ sudo docker --version
Docker version 19.03.1, build 74b1e89

I have spoken with @nmeyerhans which pointed out, that my kernel is not officially supported by firecracker, and that could be a potential problem. In the meantime, I reproduced the problem on a raspberry pi 4. I managed to build firecracker containerd as is on the board and firecracker 0.20.1 (which is needed in order to run on RPi4)

kzys commented 4 years ago

My EC2 Debian box is 4.9.0, and firecracker-containerd is working fine. The exact version seems slightly newer than what you have though.

% uname -a
Linux ip-172-31-27-222 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3 (2019-09-02) x86_64 GNU/Linux

Can you upgrade the Linux kernel on the x86 host?

Also what is the kernel version the Raspberry Pi host is using? Did you get the exact same error?

bchalios commented 4 years ago

Hi there,

I upgraded the kernel:

$ uname -r
5.5.0-rc3

and I still get the same error. It could very well be that I 'm doing something wrong with my setup, but I can't figure out what.

bchalios commented 4 years ago

Sorry, the rpi4 runs on this:

uname -r
4.19.75-v8+
bchalios commented 4 years ago

ok, fixed, it myself.

The problem was not with the environment. The problem was with the rootfs.img. I had a problem with my docker setup, so I could not run make image from the topdir. Instead I had run make rootfs.img from inside tools/image-builder. That image was not working properly, for some reason. Does that make sense?

kzys commented 4 years ago

Hmm, what was the error you got from make image? We'd like to fix the issue if we can.

kzys commented 4 years ago

Resolving. Please feel free to ping us regarding the make image issue!