firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.03k stars 1.75k forks source link

[Bug] The behaviour of MMDS(v1) has changed between v0.25.2 and v1.1.0 #3063

Closed richardcase closed 2 years ago

richardcase commented 2 years ago

Describe the bug

We rely heavily on the MMDS in Firecracker for cloud-init with the nocloud data source (i.e. ds=nocloud-net;s=http://169.254.169.254/latest/).

With Firecracker v0.25.2 this works but upgrading our solution to use Firecracker v1.1.0 cloud-init is failing as it can't communicate with the MMDS.

It appears there is a change in behaviour with MMDS between v0.25.2 and v1.1.0......even though we are still using V1 MMDS.

To Reproduce

We use files for the configuration & metadata and disable the api.

  1. Start Firecracker similar to this:
    firecracker --id 01G8JZ10M7HYCEA4S3QPA4KWC6 --boot-timer --no-api --config-file /var/lib/flintlock/vm/default/fctest/01G8JZ10M7HYCEA4S3QPA4KWC6/firecracker.cfg --metadata /var/lib/flintlock/vm/default/fctest/01G8JZ10M7HYCEA4S3QPA4KWC6/metadata.json
  2. We use a config-file like this:
    {
    "drives": [
    {
    "drive_id": "root",
    "path_on_host": "/dev/mapper/flintlock-thinpool-snap-918",
    "is_root_device": true,
    "is_read_only": false,
    "cache_type": "Unsafe"
    }
    ],
    "boot-source": {
    "kernel_image_path": "/var/lib/containerd-dev/io.containerd.snapshotter.v1.native/snapshots/714/boot/vmlinux",
    "boot_args": "reboot=k panic=1 i8042.noaux ds=nocloud-net;s=http://169.254.169.254/latest/ i8042.dumbkbd network-config=dmVyc2lvbjogMgpldGhlcm5ldHM6CiAgICBldGgwOgogICAgICAgIG1hdGNoOgogICAgICAgICAgICBtYWNhZGRyZXNzOiBBQTpGRjowMDowMDowMDowMQogICAgICAgIGFkZHJlc3NlczoKICAgICAgICAgICAgLSAxNjkuMjU0LjE2OS4yNTMvMTYKICAgICAgICBkaGNwNDogZmFsc2UKICAgICAgICBkaGNwNjogZmFsc2UKICAgIGV0aDE6CiAgICAgICAgbWF0Y2g6CiAgICAgICAgICAgIG1hY2FkZHJlc3M6IDAyOmU1OjMwOjIxOjk5OjMwCiAgICAgICAgZGhjcDQ6IHRydWUKICAgICAgICBkaGNwNjogdHJ1ZQo= console=ttyS0 pci=off i8042.nomux i8042.nopnp"
    },
    "logger": {
    "log_path": "/var/lib/flintlock/vm/default/fctest/01G8JZDTNJW2QZYJYCV2AFMZ6Q/firecracker.log",
    "level": "Debug",
    "show_level": true,
    "show_log_origin": true
    },
    "machine-config": {
    "vcpu_count": 2,
    "mem_size_mib": 4096,
    "smt": true,
    "track_dirty_pages": false
    },
    "metrics": {
    "metrics_path": "/var/lib/flintlock/vm/default/fctest/01G8JZDTNJW2QZYJYCV2AFMZ6Q/firecracker.metrics"
    },
    "MmdsConfig": {
    "version": "V1",
    "network_interfaces": [
    "eth0"
    ]
    },
    "network-interfaces": [
    {
    "iface_id": "eth0",
    "host_dev_name": "fltap792fe46",
    "guest_mac": "AA:FF:00:00:00:01"
    },
    {
    "iface_id": "eth1",
    "host_dev_name": "fltapdfae3b6",
    "guest_mac": "02:e5:30:21:99:30"
    }
    ]
    }

The metadata file (edited slightly):

{
 "latest": {
  "meta-data": "instance_id: 01G8JZDTNJW2QZYJYCV2AFMZ6Q\n",
  "user-data": "## template: jinja\n#cloud-config\n\nhostname: fctest\n"
 }
}
  1. Check the stdout of Firecracker and look for the cloud-init logs.

With Firecracker v1.1.0 we see the following network configuration in the boot logs:

+--------+-------+----------------------------+---------------+--------+-------------------+
| Device |   Up  |          Address           |      Mask     | Scope  |     Hw-Address    |
+--------+-------+----------------------------+---------------+--------+-------------------+
| bond0  | False |             .              |       .       |   .    | 02:87:fd:54:aa:f4 |
| dummy0 | False |             .              |       .       |   .    | ea:68:b6:b7:49:98 |
|  eth0  |  True |      169.254.169.253       |  255.255.0.0  | global | aa:ff:00:00:00:01 |
|  eth0  |  True |  fe80::a8ff:ff:fe00:1/64   |       .       |  link  | aa:ff:00:00:00:01 |
|  eth1  |  True |      192.168.122.184       | 255.255.255.0 | global | 02:e5:30:21:99:30 |
|  eth1  |  True | fe80::e5:30ff:fe21:9930/64 |       .       |  link  | 02:e5:30:21:99:30 |
|   lo   |  True |         127.0.0.1          |   255.0.0.0   |  host  |         .         |
|   lo   |  True |          ::1/128           |       .       |  host  |         .         |
+--------+-------+----------------------------+---------------+--------+-------------------+
++++++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++++
+-------+---------------+---------------+-----------------+-----------+-------+
| Route |  Destination  |    Gateway    |     Genmask     | Interface | Flags |
+-------+---------------+---------------+-----------------+-----------+-------+
|   0   |    0.0.0.0    | 192.168.122.1 |     0.0.0.0     |    eth1   |   UG  |
|   1   |  169.254.0.0  |    0.0.0.0    |   255.255.0.0   |    eth0   |   U   |
|   2   | 192.168.122.0 |    0.0.0.0    |  255.255.255.0  |    eth1   |   U   |
|   3   | 192.168.122.1 |    0.0.0.0    | 255.255.255.255 |    eth1   |   UH  |
+-------+---------------+---------------+-----------------+-----------+-------+

And we also see this warning in the boot logs [WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed as cloud-init cannot reach http://169.254.169.254/latest/.

This worked with Firecracker v0.25.2. And to confirm that root volume and kernel are the same.

Expected behaviour

I would expect the behaviour of MMDS v1 to be the same as in previous versions. And if the behaviour has changed that there are instructions on what additional steps need to be done when migrating.

Environment

Additional context

How has this bug affected you?

This is blocking us upgrading our solution to v1.1.0 which in turns blocks users of our solution using v1.1.0

What are you trying to achieve?

Use the latest version of Firecracker instead of using v0.25.2

Do you have any idea of what the solution might be?

Not really.

Checks

luminitavoicu commented 2 years ago

Hi @richardcase!

Thank you for reaching out to us and we are sorry that you are experiencing this problem!

In order for us to be able to reproduce this issue, it would be great if you could provide us with more information:

One other thing to note is that your host kernel (5.15) is not under Firecracker's support policy. We actively test Firecracker on 4.14 and 5.10 host and guest kernel versions and while other kernel versions may work, we do not offer any guarantees with respect to the unsupported versions.

richardcase commented 2 years ago

@luminitavoicu - thanks for the response :smile: I should've said that we use a forked version of v0.25.2 that has the macvtap feature, and --metadata functionality merged in.

For the requested information:

is the firecracker version the only thing that has changed since experiencing this issue?

Yes. We are using the same volume, kernel, network configuration and the same host.

what does the network configuration in the boot logs look for Firecracker v0.25?

+++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
+--------+-------+-----------------------------+---------------+--------+-------------------+
| Device |   Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
+--------+-------+-----------------------------+---------------+--------+-------------------+
| bond0  | False |              .              |       .       |   .    | c6:9e:62:fe:c8:d4 |
| dummy0 | False |              .              |       .       |   .    | b6:08:c3:05:51:39 |
|  eth0  |  True |         169.254.0.1         |  255.255.0.0  | global | aa:ff:00:00:00:01 |
|  eth0  |  True |   fe80::a8ff:ff:fe00:1/64   |       .       |  link  | aa:ff:00:00:00:01 |
|  eth1  |  True |       192.168.122.164       | 255.255.255.0 | global | 0a:17:42:02:26:5c |
|  eth1  |  True | fe80::817:42ff:fe02:265c/64 |       .       |  link  | 0a:17:42:02:26:5c |
|   lo   |  True |          127.0.0.1          |   255.0.0.0   |  host  |         .         |
|   lo   |  True |           ::1/128           |       .       |  host  |         .         |
+--------+-------+-----------------------------+---------------+--------+-------------------+
++++++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++++
+-------+---------------+---------------+-----------------+-----------+-------+
| Route |  Destination  |    Gateway    |     Genmask     | Interface | Flags |
+-------+---------------+---------------+-----------------+-----------+-------+
|   0   |    0.0.0.0    | 192.168.122.1 |     0.0.0.0     |    eth1   |   UG  |
|   1   |  169.254.0.0  |    0.0.0.0    |   255.255.0.0   |    eth0   |   U   |
|   2   | 192.168.122.0 |    0.0.0.0    |  255.255.255.0  |    eth1   |   U   |
|   3   | 192.168.122.1 |    0.0.0.0    | 255.255.255.255 |    eth1   |   UH  |
+-------+---------------+---------------+-----------------+-----------+-------+

what configuration you do on the host prior to spawning Firecracker?

We create tap network interfaces, and block devices.

what is the command used to fetch metadata from MMDS?

We don't send commands ourself. Instead cloud-init sends the commands based on the configured data source in the kernel arguments:

"boot-source": {
  "kernel_image_path": "/var/lib/containerd-dev/io.containerd.snapshotter.v1.native/snapshots/714/boot/vmlinux",
  "boot_args": "reboot=k panic=1 i8042.noaux ds=nocloud-net;s=http://169.254.169.254/latest/ i8042.dumbkbd network-config=dmVyc2lvbjogMgpldGhlcm5ldHM6CiAgICBldGgwOgogICAgICAgIG1hdGNoOgogICAgICAgICAgICBtYWNhZGRyZXNzOiBBQTpGRjowMDowMDowMDowMQogICAgICAgIGFkZHJlc3NlczoKICAgICAgICAgICAgLSAxNjkuMjU0LjE2OS4yNTMvMTYKICAgICAgICBkaGNwNDogZmFsc2UKICAgICAgICBkaGNwNjogZmFsc2UKICAgIGV0aDE6CiAgICAgICAgbWF0Y2g6CiAgICAgICAgICAgIG1hY2FkZHJlc3M6IDAyOmU1OjMwOjIxOjk5OjMwCiAgICAgICAgZGhjcDQ6IHRydWUKICAgICAgICBkaGNwNjogdHJ1ZQo= console=ttyS0 pci=off i8042.nomux i8042.nopnp"
 },

So this part ds=nocloud-net;s=http://169.254.169.254/latest/

it would be useful if you could attach the Firecracker logs, so that we have more details about what is happening underneath

Running Firecracker v1.1.0
2022-07-22T13:51:56.567318249 [01G8JZDTNJW2QZYJYCV2AFMZ6Q:main:INFO:src/vmm/src/resources.rs:188] Successfully added metadata to mmds from file
2022-07-22T13:51:56.590661072 [01G8JZDTNJW2QZYJYCV2AFMZ6Q:main:INFO:src/vmm/src/device_manager/mmio.rs:418] Artificially kick devices.
2022-07-22T13:51:56.590853895 [01G8JZDTNJW2QZYJYCV2AFMZ6Q:main:INFO:src/firecracker/src/main.rs:486] Successfully started microvm that was configured from one single json
2022-07-22T13:51:56.590915510 [01G8JZDTNJW2QZYJYCV2AFMZ6Q:main:WARN:src/devices/src/legacy/serial.rs:214] Detached the serial input due to peer close/error.
2022-07-22T13:51:56.897027001 [01G8JZDTNJW2QZYJYCV2AFMZ6Q:main:DEBUG:src/devices/src/virtio/block/event_handler.rs:35] block: activate event
2022-07-22T13:51:56.914303348 [01G8JZDTNJW2QZYJYCV2AFMZ6Q:main:DEBUG:src/devices/src/virtio/net/event_handler.rs:42] net: activate event
2022-07-22T13:51:56.914511158 [01G8JZDTNJW2QZYJYCV2AFMZ6Q:main:DEBUG:src/devices/src/virtio/net/event_handler.rs:42] net: activate event
alsrdn commented 2 years ago

@richardcase I'm unable to reproduce this issue. I have not tried cloud-init but I did however try to replicate the same network setup.

curl_put '/network-interfaces/net0' <<EOF
{
  "iface_id": "net0",
  "guest_mac": "aa:ff:00:00:00:01",
  "host_dev_name": "tap0"
}
EOF

curl_put '/network-interfaces/net1' <<EOF
{
  "iface_id": "net1",
  "guest_mac": "06:00:AC:10:00:02",
  "host_dev_name": "tap1"
}
EOF

curl_put '/mmds/config' <<EOF
{
  "version": "V1",
  "network_interfaces": [
   "net0"
  ]
}
EOF

curl_put '/mmds' <<EOF
{
  "latest": {
    "meta-data": {
      "ami-id": "ami-87654321",
      "reservation-id": "r-79054aef"
    }
  }
}
EOF

Inside the microVM, I've set up the same address and routing as seen in your logs:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether aa:ff:00:00:00:01 brd ff:ff:ff:ff:ff:ff
    inet 169.254.169.253/16 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 06:00:ac:10:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.2/30 scope global eth1
       valid_lft forever preferred_lft forever
route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.16.0.1      0.0.0.0         UG    0      0        0 eth1
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
172.16.0.0      0.0.0.0         255.255.255.252 U     0      0        0 eth1

The metadata service is accessible without issues:

root@ubuntu-fc-uvm:~# curl -s "http://169.254.169.254/latest"; echo
meta-data/
root@ubuntu-fc-uvm:~# arp -v
Address                  HWtype  HWaddress           Flags Mask            Iface
172.16.0.1               ether   c2:88:5f:3e:6a:51   C                     eth1
169.254.169.254          ether   06:01:23:45:67:01   C                     eth0
Entries: 2  Skipped: 0  Found: 2

Also tcpdump looks good to me:

13:22:15.400154 ARP, Request who-has 169.254.169.254 tell 169.254.169.253, length 28
13:22:15.400306 ARP, Reply 169.254.169.254 is-at 06:01:23:45:67:01 (oui Unknown), length 28
13:22:15.400314 IP 169.254.169.253.41860 > 169.254.169.254.80: Flags [S], seq 441642375, win 29200, options [mss 1460,sackOK,TS val 263515744 ecr 0,nop,wscale 4], length 0
13:22:15.400412 IP 169.254.169.254.80 > 169.254.169.253.41860: Flags [S.], seq 816363402, ack 441642376, win 2500, options [mss 1460], length 0
13:22:15.400442 IP 169.254.169.253.41860 > 169.254.169.254.80: Flags [.], ack 1, win 29200, length 0
13:22:15.400465 IP 169.254.169.253.41860 > 169.254.169.254.80: Flags [P.], seq 1:86, ack 1, win 29200, length 85: HTTP: GET /latest HTTP/1.1
13:22:15.400526 IP 169.254.169.254.80 > 169.254.169.253.41860: Flags [.], seq 1:129, ack 86, win 2500, length 128: HTTP: HTTP/1.1 200

Can you verify the output of tcpdump on one of the microvms? Also can you look at the ARP cache on one of the failed microvms?

Not sure how your eth0 network is set up, but maybe what happens is that some of your microVMs have a bad ARP cache. The default address for MMDS is 169.254.169.254 which may have been assigned to another machine in the network that eth0 is connected to since cloud-init apparently assigns an address from that /16 range. Does the problem persist if you use a different IP address for MMDS and add a route to that?

richardcase commented 2 years ago

Thanks for the response @alsrdn. I will try what you said and respond here.

A few comments:

richardcase commented 2 years ago

@alsrdn - turns out the issue was with this in the configuration file we use:

 "MmdsConfig": {
  "version": "V1",
  "network_interfaces": [
   "eth0"
  ]
 },

This should have been:

 "mmds-config": {
  "version": "V1",
  "network_interfaces": [
   "eth0"
  ]
 },

I will create a separate issue for this and close this one. Thanks for looking into this @alsrdn & @luminitavoicu

jeffwidman commented 1 year ago

@richardcase did you ever create a separate issue for this? Or no need?

richardcase commented 1 year ago

I haven't created it....forgot in all honesty. I will create one