firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.03k stars 1.75k forks source link

1.4.0 fails on Digital Ocean with `Failed to get brand string: Missing frequency` #3995

Closed sean-nicholas closed 1 year ago

sean-nicholas commented 1 year ago

Describe the bug

Version 1.4.0 fails on Digital Ocean Droplets with the following error:

{"fault_message":"Internal error while starting microVM: Error configuring the vcpu for boot: Failed to apply modifications to CPUID: Failed to apply modifications to Intel CPUID: Failed to get brand string: Missing frequency: [68, 79, 45, 80, 114, 101, 109, 105, 117, 109, 45, 73, 110, 116, 101, 108, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]."}

It seems that Firecracker assumes that there is a frequency in the brand name but on digital ocean the brand name / model name is:

$ grep -m 1 'model name' /proc/cpuinfo
model name      : DO-Premium-Intel

DO-Premium-Intel is the same as 68, 79, 45, 80, 114, 101, 109, 105, 117, 109, 45, 73, 110, 116, 101, 108 in ASCII:

$ for code in 68 79 45 80 114 101 109 105 117 109 45 73 110 116 101 108; do printf "\x$(printf %x $code)"; done; echo
DO-Premium-Intel

Because no THz, GHz or MHz is found in the brand name Err(DefaultBrandStringError::MissingFrequency(host_brand_string)) is thrown.

In Version 1.3.3 this error does not exist. Version 1.3.3. works just fine.

To Reproduce

Create a droplet with the following configuration

Run setup.sh:

ARCH="$(uname -m)"

# Download a linux kernel binary
wget https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/${ARCH}/kernels/vmlinux.bin

# Download a rootfs
wget https://s3.amazonaws.com/spec.ccfc.min/ci-artifacts/disks/${ARCH}/ubuntu-18.04.ext4

# Download the ssh key for the rootfs
wget https://s3.amazonaws.com/spec.ccfc.min/ci-artifacts/disks/${ARCH}/ubuntu-18.04.id_rsa

# Set user read permission on the ssh key
chmod 400 ./ubuntu-18.04.id_rsa

wget https://github.com/firecracker-microvm/firecracker/releases/download/v1.4.0/firecracker-v1.4.0-${ARCH}.tgz

tar -xzvf firecracker-v1.4.0-${ARCH}.tgz

cp release-v1.4.0-${ARCH}/firecracker-v1.4.0-${ARCH} ./firecracker

Run run.sh:

API_SOCKET="/tmp/firecracker.socket"

# Remove API unix socket
rm -f $API_SOCKET

# Run firecracker
./firecracker --api-sock "${API_SOCKET}"

In a new terminal run start.sh:

TAP_DEV="tap0"
TAP_IP="172.16.0.1"
MASK_SHORT="/30"

# Setup network interface
sudo ip link del "$TAP_DEV" 2> /dev/null || true
sudo ip tuntap add dev "$TAP_DEV" mode tap
sudo ip addr add "${TAP_IP}${MASK_SHORT}" dev "$TAP_DEV"
sudo ip link set dev "$TAP_DEV" up

# Enable ip forwarding
sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"

# Set up microVM internet access
sudo iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE || true
sudo iptables -D FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT \
    || true
sudo iptables -D FORWARD -i tap0 -o eth0 -j ACCEPT || true
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
sudo iptables -I FORWARD 1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
sudo iptables -I FORWARD 1 -i tap0 -o eth0 -j ACCEPT

API_SOCKET="/tmp/firecracker.socket"
LOGFILE="./firecracker.log"

# Create log file
touch $LOGFILE

# Set log file
curl -X PUT --unix-socket "${API_SOCKET}" \
    --data "{
        \"log_path\": \"${LOGFILE}\",
        \"level\": \"Debug\",
        \"show_level\": true,
        \"show_log_origin\": true
    }" \
    "http://localhost/logger"

KERNEL="./vmlinux.bin"
KERNEL_BOOT_ARGS="console=ttyS0 reboot=k panic=1 pci=off"

ARCH=$(uname -m)

if [ ${ARCH} = "aarch64" ]; then
    KERNEL_BOOT_ARGS="keep_bootcon ${KERNEL_BOOT_ARGS}"
fi

# Set boot source
curl -X PUT --unix-socket "${API_SOCKET}" \
    --data "{
        \"kernel_image_path\": \"${KERNEL}\",
        \"boot_args\": \"${KERNEL_BOOT_ARGS}\"
    }" \
    "http://localhost/boot-source"

ROOTFS="./ubuntu-18.04.ext4"

# Set rootfs
curl -X PUT --unix-socket "${API_SOCKET}" \
    --data "{
        \"drive_id\": \"rootfs\",
        \"path_on_host\": \"${ROOTFS}\",
        \"is_root_device\": true,
        \"is_read_only\": false
    }" \
    "http://localhost/drives/rootfs"

# The IP address of a guest is derived from its MAC address with
# `fcnet-setup.sh`, this has been pre-configured in the guest rootfs. It is
# important that `TAP_IP` and `FC_MAC` match this.
FC_MAC="06:00:AC:10:00:02"

# Set network interface
curl -X PUT --unix-socket "${API_SOCKET}" \
    --data "{
        \"iface_id\": \"net1\",
        \"guest_mac\": \"$FC_MAC\",
        \"host_dev_name\": \"$TAP_DEV\"
    }" \
    "http://localhost/network-interfaces/net1"

# API requests are handled asynchronously, it is important the configuration is
# set, before `InstanceStart`.
sleep 0.015s

# Start microVM
curl -X PUT --unix-socket "${API_SOCKET}" \
    --data "{
        \"action_type\": \"InstanceStart\"
    }" \
    "http://localhost/actions"

# API requests are handled asynchronously, it is important the microVM has been
# started before we attempt to SSH into it.
sleep 0.015s

# SSH into the microVM
sudo ssh -i ./ubuntu-18.04.id_rsa 172.16.0.2

# Use `root` for both the login and password.
# Run `reboot` to exit.

It fails with the error from above:

{"fault_message":"Internal error while starting microVM: Error configuring the vcpu for boot: Failed to apply modifications to CPUID: Failed to apply modifications to Intel CPUID: Failed to get brand string: Missing frequency: [68, 79, 45, 80, 114, 101, 109, 105, 117, 109, 45, 73, 110, 116, 101, 108, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]."}

Expected behaviour

Version 1.3.3. works so downloading this binary and using it works:

Modify setup.sh and run it:

ARCH="$(uname -m)"

# Download a linux kernel binary
wget https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/${ARCH}/kernels/vmlinux.bin

# Download a rootfs
wget https://s3.amazonaws.com/spec.ccfc.min/ci-artifacts/disks/${ARCH}/ubuntu-18.04.ext4

# Download the ssh key for the rootfs
wget https://s3.amazonaws.com/spec.ccfc.min/ci-artifacts/disks/${ARCH}/ubuntu-18.04.id_rsa

# Set user read permission on the ssh key
chmod 400 ./ubuntu-18.04.id_rsa

wget https://github.com/firecracker-microvm/firecracker/releases/download/v1.3.3/firecracker-v1.3.3-${ARCH}.tgz

tar -xzvf firecracker-v1.3.3-${ARCH}.tgz

cp release-v1.3.3-${ARCH}/firecracker-v1.3.3-${ARCH} ./firecracker

Run run.sh and start.sh as described above --> You can login as root

Environment

Additional context

I want to use the new entropy endpoint in Version 1.4.0. So I can't play with it on a cheap digital ocean droplet.

Checks

Kos-M commented 1 year ago

Same here, found that bug while trying to run quick start example from Running Firecracker Section

{"fault_message":"Internal error while starting microVM: Error configuring the vcpu for boot: Failed to apply modifications to CPUID: Failed to apply modifications to Intel CPUID: Failed to get brand string: Missing frequency: [49, 50, 116, 104, 32, 71, 101, 110, 32, 73, 110, 116, 101, 108, 40, 82, 41, 32, 67, 111, 114, 101, 40, 84, 77, 41, 32, 105, 53, 45, 49, 50, 51, 53, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]."}

I think probably im missing an argument in some step, not sure. my host os/cpu specs if needed:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               GenuineIntel
  Model name:            12th Gen Intel(R) Core(TM) i5-1235U
    CPU family:          6
    Model:               154
    Thread(s) per core:  2
    Core(s) per socket:  10
    Socket(s):           1
    Stepping:            4
    CPU(s) scaling MHz:  39%
    CPU max MHz:         4400.0000
    CPU min MHz:         400.0000
    BogoMIPS:            4992.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss 
                         ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
                          arch_perfmon pebs bts rep_good nopl xtopology nonstop_
                         tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
                         4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
                         pr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_
                         timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefet
                         ch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced t
                         pr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase ts
                         c_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx sm
                         ap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xget
                         bv1 xsaves split_lock_detect avx_vnni dtherm ida arat p
                         ln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_re
                         q hfi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid
                          movdiri movdir64b fsrm md_clear serialize arch_lbr ibt
                          flush_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   352 KiB (10 instances)
  L1i:                   576 KiB (10 instances)
  L2:                    6.5 MiB (4 instances)
  L3:                    12 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-11
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB fillin
                         g, PBRSB-eIBRS SW sequence
  Srbds:                 Not affected
  Tsx async abort:       Not affected
~ $ uname -a
Linux t13 6.2.0-26-generic #26-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 10 23:39:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Following second Tutorial from this: gist Error:

2023-07-30T22:11:52.826354212 [anonymous-instance:main:ERROR:src/firecracker/src/main.rs:503] Building VMM configured from cmdline json failed: Internal(VcpuConfigure(NormalizeCpuidError(Intel(GetBrandString(MissingFrequency([49, 50, 116, 104, 32, 71, 101, 110, 32, 73, 110, 116, 101, 108, 40, 82, 41, 32, 67, 111, 114, 101, 40, 84, 77, 41, 32, 105, 53, 45, 49, 50, 51, 53, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))))))

config:

{
  "boot-source": {
    "kernel_image_path": "hello-vmlinux.bin",
    "boot_args": "ro console=ttyS0 noapic reboot=k panic=1 pci=off nomodules random.trust_cpu=on ip=169.254.0.21::169.254.0.22:255.255.255.252::wlo1:off"
  },
  "drives": [
    {
      "drive_id": "rootfs",
      "path_on_host": "hello-rootfs.ext4",
      "is_root_device": true,
      "is_read_only": false
    }
  ],
  "network-interfaces": [
      {
          "iface_id": "wlo1",
          "guest_mac": "02:FC:00:00:00:05",
          "host_dev_name": "fc-88-tap0"
      }
  ],
  "machine-config": {
    "vcpu_count": 1,
    "mem_size_mib": 1024,
    "smt": true
  }
}
wearyzen commented 1 year ago

Hi @sean-nicholas, @Kos-M,

Thank you for reporting this. This is to confirm that I do see the change in behaviour in v1.4 vs v1.3.3 and I am looking into this.

rastoku commented 1 year ago

Hi,

I am facing similar issue in version 1.4.0 (compiled from source). VM does not start and log provides this message:

"2023-08-03T22:00:57.891597821 [anonymous-instance:main] Building VMM configured from cmdline json failed: Internal(VcpuConfigure(NormalizeCpuidError(Intel(GetBrandString(Overflow)))))"

But VM start successfully If I use version 1.3.3 with the same config, kernel and root fs.

It is happening on laptop with "Intel(R) Celeron(R) N5100 @ 1.10GHz".

Thank you in advance

wearyzen commented 1 year ago

Hi @sean-nicholas, @Kos-M, @rastoku, We have release Firecracker v1.4.1. Could you please give it a try and let us know if the issue is fixed with the new release?

sean-nicholas commented 1 year ago

Hey @sudanl0

I can confirm: It works with 1.4.1 again :)

Thank you very much. Was a really great experience. Fast first response and fast fix & version bump. Love it :)

wearyzen commented 1 year ago

Hi @sean-nicholas, thank you for confirming this. Happy to see that you are unblocked and sorry for any inconvenience. I'll wait for feedback from @Kos-M and @rastoku as well before closing the issue.

rastoku commented 1 year ago

Hi all,   the VM starts correctly with 1.4.1. Thank you!   BR,   Rasto  


Od: "sudanl0" @.> Komu: "firecracker-microvm/firecracker" @.> Dátum: 10.08.2023 11:21 Predmet: Re: [firecracker-microvm/firecracker] 1.4.0 fails on Digital Ocean with Failed to get brand string: Missing frequency (Issue #3995)

CC: @.>   Hi @sean-nicholas https://github.com/sean-nicholas, thank you for confirming this and happy to see that you are unblocked. I'll wait for feedback from @Kos-M https://github.com/Kos-M and @rastoku https://github.com/rastoku as well before closing the issue. — Reply to this email directly, view it on GitHub https://github.com/firecracker-microvm/firecracker/issues/3995#issuecomment-1672866009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMUWPQ4MFVA5AIX3I5TYLO3XUSR2HANCNFSM6AAAAAA24ORZ2Y. You are receiving this because you were mentioned.Message ID: @.>

Kos-M commented 1 year ago

I confirm it too , my vm spinned as expected , thank you @sudanl0 for solving quickly this.

wearyzen commented 1 year ago

Closing the issue, thank you all for confirming the fix.