Closed dosisod closed 1 year ago
Hello, thanks for opening an issue.
I cannot reproduce the issue on my side with the reproduction steps, so I'd like more information.
Could you give me a strace
result (strace.txt
with the following command)?
$ sudo starce -f -o strace.txt \
./firecracker \
--api-sock /tmp/firecracker.socket \
--level Debug \
--log-path /tmp/firelog \
--show-level \
--show-log-origin
Thanks,
Here is the contents of strace.txt
(using version v1.2.0
from the release page):
```txt
373207 execve("./firecracker", ["./firecracker", "--api-sock", "/tmp/firecracker.socket", "--level", "Debug", "--log-path", "/tmp/firelog", "--show-level", "--show-log-origin"], 0x7ffc0eda09c8 /* 15 vars */) = 0
373207 mmap(NULL, 392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f580b059000
373207 arch_prctl(ARCH_SET_FS, 0x7f580b0590a8) = 0
373207 set_tid_address(0x81b7d8) = 373207
373207 poll([{fd=0, events=0}, {fd=1, events=0}, {fd=2, events=0}], 3, 0) = 0 (Timeout)
373207 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x5acb3b}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
373207 rt_sigaction(SIGSEGV, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
373207 rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
373207 rt_sigaction(SIGSEGV, {sa_handler=0x4d2a40, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 rt_sigaction(SIGBUS, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
373207 rt_sigaction(SIGBUS, {sa_handler=0x4d2a40, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0
373207 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f580b056000
373207 mprotect(0x7f580b056000, 4096, PROT_NONE) = 0
373207 sigaltstack({ss_sp=0x7f580b057000, ss_flags=0, ss_size=8192}, NULL) = 0
373207 brk(NULL) = 0x1189000
373207 brk(0x118a000) = 0x118a000
373207 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
373207 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
373207 rt_sigaction(SIGSYS, {sa_handler=0x533540, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 rt_sigaction(SIGBUS, {sa_handler=0x533b00, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 rt_sigaction(SIGSEGV, {sa_handler=0x534100, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 rt_sigaction(SIGXFSZ, {sa_handler=0x534700, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 rt_sigaction(SIGXCPU, {sa_handler=0x534d00, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 rt_sigaction(SIGPIPE, {sa_handler=0x535300, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 rt_sigaction(SIGHUP, {sa_handler=0x535580, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 rt_sigaction(SIGILL, {sa_handler=0x535b80, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5acb3b}, NULL, 8) = 0
373207 brk(0x118d000) = 0x118d000
373207 open("/tmp/firelog", O_RDWR|O_NONBLOCK|O_CLOEXEC) = 3
373207 fcntl(3, F_SETFD, FD_CLOEXEC) = 0
373207 write(3, "Running Firecracker v1.2.0\n", 27) = 27
373207 getrandom("\x76\x6a\xe3\x34\xb5\x95\xa4\x43\xb1\x4b\x7d\x3b\x5f\x70\x3b\xeb", 16, GRND_INSECURE) = 16
373207 brk(0x118e000) = 0x118e000
373207 brk(0x118f000) = 0x118f000
373207 eventfd2(0, 0) = 4
373207 dup(4) = 5
373207 rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
373207 mmap(NULL, 2109440, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f580ae53000
373207 mprotect(0x7f580ae55000, 2101248, PROT_READ|PROT_WRITE) = 0
373207 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
373207 clone(child_stack=0x7f580b055a48, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|0x400000, parent_tid=[373208], tls=0x7f580b055b20, child_tidptr=0x81b7d8) = 373208
373207 rt_sigprocmask(SIG_SETMASK, [],
And here are the accompanying logs from /tmp/firelog
for reference:
Running Firecracker v1.2.0
2023-02-08T20:05:28.889047591 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:216] The API server received a Put request on "/boot-source" with body "{\n \"kernel_image_path\": \"/home/loot/code/firecracker/hello-vmlinux.bin\",\n \"boot_args\": \"console=ttyS0 reboot=k panic=1 pci=off\"\n }".
2023-02-08T20:05:28.889926455 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:163] The request was executed successfully. Status code: 204 No Content.
2023-02-08T20:05:28.890091431 [anonymous-instance:fc_api:DEBUG:src/api_server/src/lib.rs:211] Total previous API call duration: 1055 us.
2023-02-08T20:05:28.904788589 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:216] The API server received a Put request on "/drives/rootfs" with body "{\n \"drive_id\": \"rootfs\",\n \"path_on_host\": \"/home/loot/code/firecracker/hello-rootfs.ext4\",\n \"is_root_device\": true,\n \"is_read_only\": false\n }".
2023-02-08T20:05:28.906205655 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:163] The request was executed successfully. Status code: 204 No Content.
2023-02-08T20:05:28.906358517 [anonymous-instance:fc_api:DEBUG:src/api_server/src/lib.rs:211] Total previous API call duration: 1578 us.
2023-02-08T20:05:28.913189658 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:216] The API server received a Put request on "/actions" with body "{\n \"action_type\": \"InstanceStart\"\n }".
2023-02-08T20:05:28.991391247 [anonymous-instance:main:INFO:src/vmm/src/device_manager/mmio.rs:408] Artificially kick devices.
2023-02-08T20:05:28.991879719 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:163] The request was executed successfully. Status code: 204 No Content.
2023-02-08T20:05:28.992201201 [anonymous-instance:fc_api:DEBUG:src/api_server/src/lib.rs:211] Total previous API call duration: 79015 us.
I ran it using the same procedure as above, and after about 30 seconds I killed it.
Thank you for the response, hopefully this info is helpful!
Thanks for giving logs quickly!
As far as I see your strace log, it reached KVM_RUN
call and appears to start a guest VM.
Please give me more time to dive deep into this.
Let me share a log on my side:
Host info:
The main difference is host kernel version (5.10 vs. 6.1). If you have kernel 5.10 machine, I'd appreciate it if you could give it a try.
Thanks,
I tried on Arch Linux with kernel 6.1 (AMI from here), but I couldn't reproduce that.
/tmp/firelog:
strace.txt
Hi @dosisod , I tried to reproduce the issue, but I couldn't. Is there anything else that you have noticed on your side?
I have yet to test my Arch build on a different computer, or test a different OS on my current computer. Given that you have tested this on an Arch 6.1 kernel I suspect it is either something wrong with my OS configuration, or something wrong with my physical hardware.
I will see what I can do about narrowing this down and get back to you when I have some more info.
Hi @dosisod ,
Thanks for reporting the issue. We were not able to reproduce this issue in our testing with host kernel 6.1. At the same time, we have further improved Firecracker and our testing methods for kernel 6.1 since the issue was opened. We are closing this issue and we suggest to try to boot a microVM using newer Firecracker version. Please feel free to reopen this issue, if you will still experiencing problems.
@xmarcalx I just tested this again and it works. I'm on kernel version 6.4.10 now, so perhaps something since then fixed it. I tried bisecting Firecracker to see if it was Firecracker related, but I cannot build 81e771a
(commit when I opened this), v1.2.0
or v1.3.0
due to some vmm
-related build errors (I could post the build errors if you like). The earliest version I could find that actually compiled was v1.4.0
, and that one works.
Thanks for bringing this up again, I suspect this was a kernel/CPU bug and/or computer issues on my end.
Describe the bug
After setting up Firecracker (based off the getting started docs) Firecracker simply just hangs after the guest VM is started. There is nothing printed to stdout, no errors, no logs, etc.
This is the output I am getting:
And the logs from
/tmp/firelog
:To Reproduce
After starting Firecracker, I ran the following curl commands (still using the commands from the intro docs):
Running the above script as sudo produces the following:
After that, the Firecracker instance in the other terminal still hasn't done anything.
Expected behaviour
I expect to see a login prompt.
Environment
Firecracker Version
I tried out both
v1.2.0
andmain
(currently at81e771a
), neither worked for me.Guest Version Info
I can't say for certain what version the guest kernel is using (since I cannot get it booted), but it is the same kernel used in the docs:
https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/x86_64/kernels/vmlinux.bin
Same goes for the rootfs:
https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/x86_64/rootfs/bionic.rootfs.ext4
Host Version Info
OS Info:
CPU info:
KVM/Virtio related kernel modules:
I loaded every KVM/Virtio kernel module available on my system to see if that would fix anything to no avail.
Additional context
This issue is currently keeping me from using Firecracker on my dev machine. My inclination tells me that this is a KVM related issue (either with my host machine or in the way Firecracker is interfacing with the KVM), though I can't say for certain.
Things I have tried to do to fix/diagnose the issue:
dmesg
andjournalctl
for errors (found none)I've done everything I can think of to diagnose it, but I still don't know what the issue might be, if it is something with my setup, or something with Firecracker itself I can't say. I don't see any crashes or error messages or anything which I find odd, it just isn't doing anything.
Checks