firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
26.32k stars 1.83k forks source link

[:ERROR:devices/src/virtio/net.rs:305] Failed to write to tap #746

Closed rmzg closed 5 years ago

rmzg commented 5 years ago

I've been attempting to perform some basic firecracker tests and this is my setup so far:

firecracker-v0.11.0 pre-built binary on Linux dl380 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08) x86_64 GNU/Linux

firecracker running as root via sudo.

Setup commands:

curl --unix-socket /tmp/firecracker.socket -i     -X PUT 'http://localhost/boot-source'       -H 'Accept: application/json'               -H 'Content-Type: application/json'         -d '{
        "kernel_image_path": "./hello-vmlinux.bin",
        "boot_args": "console=ttyS0 reboot=k panic=1 pci=off"
    }'

curl --unix-socket /tmp/firecracker.socket -i     -X PUT 'http://localhost/drives/rootfs'     -H 'Accept: application/json'               -H 'Content-Type: application/json'         -d '{
        "drive_id": "rootfs",
        "path_on_host": "./hello-rootfs.ext4",
        "is_root_device": true,
        "is_read_only": false
    }'

curl --unix-socket /tmp/firecracker.socket -i     -X PUT 'http://localhost/network-interfaces/0' -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{
 "host_dev_name":"tap1","iface_id":"0", "state":"Attached"}'

curl --unix-socket /tmp/firecracker.socket -i     -X PUT 'http://localhost/actions'           -H  'Accept: application/json'              -H  'Content-Type: application/json'        -d '{
        "action_type": "InstanceStart"
     }'

All of which reported the usual 204 responses.

ip link ls on the host:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp3s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:1e:a1:07:51:c2 brd ff:ff:ff:ff:ff:ff
3: enp3s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:1e:a1:07:51:c4 brd ff:ff:ff:ff:ff:ff
4: enp4s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:1e:a1:07:51:c6 brd ff:ff:ff:ff:ff:ff
5: enp4s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:1e:a1:07:51:c8 brd ff:ff:ff:ff:ff:ff
6: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:02:c9:55:9e:78 brd ff:ff:ff:ff:ff:ff
7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:02:c9:55:9e:78 brd ff:ff:ff:ff:ff:ff
8: tap1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 4a:b9:64:a3:91:c0 brd ff:ff:ff:ff:ff:ff

Inside the VM I did some basic ip commands to set the link up and give it an address and ended up with the following config:

localhost:~# ip link ls
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 2e:40:19:82:65:dd brd ff:ff:ff:ff:ff:ff
localhost:~# ip addr ls
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 2e:40:19:82:65:dd brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.81/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2c40:19ff:fe82:65dd/64 scope link
       valid_lft forever preferred_lft forever
localhost:~# ip route
default via 192.168.1.81 dev eth0
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.81

This was giving me the following errors: 2018-12-05T14:45:16.898997365 [:ERROR:devices/src/virtio/net.rs:305] Failed to write to tap: Os { code: 5, kind: Other, message: "I/O error" }

UPDATE As I was writing this I noticed the TAP device was set to 'down' so I ran sudo ip link set tap1 up and now I no longer get the above error mention in the VM but I can't actually ping anywhere.

UPDATE 2 Once I run ip link set tap1 master br0 everything is happy days.

This is basically a very long winded way to say A) tap devices should probably be set to 'up' if they're being created B) The documentation for what you're supposed to pass as an argument to PUT /network-interfaces/ is unclear (I'll try to submit a patch) C) The error message when you send the wrong value is REALLY unclear ("fault_message": "Cannot open TAP device. Invalid name/permissions. CreateTap(Os { code: 22, kind: InvalidInput, message: Invalid argument })"). Not sure how to fix that one D) should the NetworkConfig object take some args to attach to a bridge automatically or is this all documented someplace else?

alxiord commented 5 years ago

Hi! I'll address the questions in order:

A) tap devices should probably be set to 'up' if they're being created

That's up to whomever creates them in the first place and has nothing to do with Firecracker.

B) The documentation for what you're supposed to pass as an argument to PUT /network-interfaces/ is unclear (I'll try to submit a patch)

The yaml contains the definition of what's supposed to go in the /network-interfaces payload but indeed there are no examples. #711 tracks this. Patches are welcome :slightly_smiling_face:

C) The error message when you send the wrong value is REALLY unclear ("fault_message": "Cannot open TAP device. Invalid name/permissions. CreateTap(Os { code: 22, kind: InvalidInput, message: Invalid argument })"). Not sure how to fix that one

Improving error messages is on our to-do list. Patches are welcome here too. In your particular case, the error is CreateTap == Firecracker wasn't able to open the tap device passed through the API, and the error code is 22 == Invalid Value (see errno definitions).

D) should the NetworkConfig object take some args to attach to a bridge automatically or is this all documented someplace else?

No, the /network-interfaces aguments are only the ones in the yaml. If you mean to connect the tap interface to a bridge on the host, that also has nothing to do with the Firecracker process, it's up to the user to create & configure everything on the host side before sending it to Firecracker.

raduweiss commented 5 years ago

To add a bit more context here, there's a lot of ways that host networking can be set up with a bunch of Firecracker microVMs, and there can be a lot of variation around how each user's set up looks like. We picked the TAP device as a primitive specifically because it works with a lot of possible host network setups.

We want to be very opinionated on things like providing proper security and isolation, but for things like how to configure network and storage, we recognize that we can't really know what people will want, and so we just provide the primitives.

That being said, we'll definitely want our docs to have good examples of simple setups for network, storage, and logging.

rmzg commented 5 years ago

A) tap devices should probably be set to 'up' if they're being created That's up to whomever creates them in the first place and has nothing to do with Firecracker.

Are you sure the tap device is created outside firecracker? When I re-run the put network-interfaces/0 ... "state":"Attached" bit, firecracker seems to create the tap device for me. This is v0.11.0 in case something has changed since then, but I would assume if I tell it to create a tap device and attach it to my instance, it should probably default to up instead of down. This is also where I get the idea for being able to pass other options like a master bridge and so forth.

And on an only very tangentially related note, when I run reboot from the console of my firecracker vm (following the steps of the demo in the faq), it exits but doesn't delete the /tmp/firecracker.socket so that subsequent invocations of firecracker fail because the socket already exists.

Is this an error on my part somewhere or what?

acatangiu commented 5 years ago

@rmzg

Are you sure the tap device is created outside firecracker?

The tun/tap system API call for opening a tap device will open an existing device or will attempt to create one if inexistent. Running firecracker as a privileged process or setting the CAP_NET_ADMIN capability on the firecracker binary will result in actual creation of a tap by the firecracker process.

The recommended method of running firecracker though is unprivileged and providing an already-created tap interface.

And on an only very tangentially related note, when I run reboot from the console of my firecracker vm (following the steps of the demo in the faq), it exits but doesn't delete the /tmp/firecracker.socket so that subsequent invocations of firecracker fail because the socket already exists.

This is not an error, it is by design. When running with seccomp syscall filtering enabled, deleting any files on the host system is disabled.

Note that seccomp filters are not enabled by default in v0.11.0, but will be starting with the next release which should land soon.

rmzg commented 5 years ago

Ah, ok, I understand more about the net/tap thing now. When I first tried it I didn't realize I needed to pass it an existing tap device so I ended up giving the binary cap_net_admin.

Would it be possible/desirable to disable the part where it might create a tap device and just have it always throw an error if it doesn't exist?

Not deleting host files makes a lot of sense for security but the ene result of not being able to run firecracker twice in a row without errors seems pretty awkward. Am I invoking it incorrectly? Does this work better if I specify the socket name explicitly?

On Thu, Dec 6, 2018, 4:49 PM Adrian Catangiu <notifications@github.com wrote:

@rmzg https://github.com/rmzg

Are you sure the tap device is created outside firecracker?

The tun/tap system API call for opening a tap device will open an existing device or will attempt to create one if inexistent. Running firecracker as a privileged process or setting the CAP_NET_ADMIN capability on the firecracker binary will result in actual creation of a tap by the firecracker process.

The recommended method of running firecracker though is unprivileged and providing an already-created tap interface.

And on an only very tangentially related note, when I run reboot from the console of my firecracker vm (following the steps of the demo in the faq), it exits but doesn't delete the /tmp/firecracker.socket so that subsequent invocations of firecracker fail because the socket already exists.

This is not an error, it is by design. When running with seccomp syscall filtering enabled, deleting any files on the host system is disabled.

Note that seccomp filters are not enabled by default in v0.11.0, but will be starting with the next release which should land soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/firecracker-microvm/firecracker/issues/746#issuecomment-445058801, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOS59Dj1gCeAzsnTrz5gH5kxOBo8iJ9ks5u2Z8CgaJpZM4ZE-p5 .

alxiord commented 5 years ago

Hi @rmzg,

Would it be possible/desirable to disable the part where it might create a tap device and just have it always throw an error if it doesn't exist?

Yes, that makes sense. Logged #754 to track this.

Am I invoking it incorrectly? Does this work better if I specify the socket name explicitly?

Yes, you are invoking correctly. I don't understand what you mean by specifying the socket name explicitly. The socket file needs to be manually removed after each Firecracker run. If you want to reuse the same socket file, you still have to remove it. Firecracker does not clean it up.