firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.11k stars 1.76k forks source link

Rust v1.33.0 causes 148 exit code #997

Closed xibz closed 5 years ago

xibz commented 5 years ago

So I am running the SDK for Firecracker in Docker and noticed that Firecracker exits with 148 and with the error message Shutting down VM after intercepting a bad syscall (288). Which is confusing to why that is occurring. The tests work fine without running in Docker.

The simplest way to reproduce this would be to checkout my docker branch from my repo and running make sandbox-test-fc.

Here are the logs from the test

2019-03-09T00:03:51.540470092 [b:DEBUG] loop poll - 722ns                                                                                                                                                                                                                                                                                                                    [61/1725]
2019-03-09T00:03:51.540477324 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 727231344 }                                                                                                                                                                                                                                                                                   
2019-03-09T00:03:51.540490866 [b:INFO] The synchronous Put request on "/drives/1" with body "{\"drive_id\":\"1\",\"is_read_only\":false,\"is_root_device\":true,\"path_on_host\":\"root-drive.img\"}\n" was executed successfully. Status code: 204 No Content.                                                                                                                       
2019-03-09T00:03:51.540504378 [b:DEBUG] flushed 64 bytes                                                                                                                                                                                                                                                                                                                              
2019-03-09T00:03:51.540515956 [b:DEBUG] scheduling direction for: 1
2019-03-09T00:03:51.540523073 [b:DEBUG] blocking
2019-03-09T00:03:51.540531285 [b:DEBUG] loop process - 1 events, 61.172µs
2019-03-09T00:03:51.540539998 [b:DEBUG] loop poll - 1.193µs
2019-03-09T00:03:51.540546390 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 727301249 }
time="2019-03-09T00:03:51Z" level=debug msg="HTTP/1.1 204 No Content\r\nContent-Length: 0\r\nDate: Sat, 09 Mar 2019 00:03:51 GMT\r\n\r\n\n"
2019-03-09T00:03:51.540554431 [b:DEBUG] loop process - 1 events, 14.435µs
time="2019-03-09T00:03:51Z" level=info msg="Attached drive root-drive.img: [PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "
time="2019-03-09T00:03:51Z" level=debug msg="attachDrive returned for root-drive.img"
time="2019-03-09T00:03:51Z" level=debug msg="Running handler fcinit.CreateNetworkInterfaces"
time="2019-03-09T00:03:51Z" level=debug msg="Running handler fcinit.AddVsocks"
time="2019-03-09T00:03:51Z" level=debug msg="PUT /actions HTTP/1.1\r\nHost: localhost\r\nUser-Agent: Go-http-client/1.1\r\nContent-Length: 32\r\nAccept: application/json\r\nContent-Type: application/json\r\nAccept-Encoding: gzip\r\n\r\n{\"action_type\":\"InstanceStart\"}\n\n"
2019-03-09T00:03:51.540739431 [b:DEBUG] loop poll - 177.481µs
2019-03-09T00:03:51.540745215 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 727500691 }
2019-03-09T00:03:51.540753318 [b:DEBUG] notifying a task handle
2019-03-09T00:03:51.540760945 [b:DEBUG] loop process - 1 events, 21.51µs
2019-03-09T00:03:51.540769415 [b:DEBUG] loop poll - 686ns
2019-03-09T00:03:51.540776784 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 727530669 }
2019-03-09T00:03:51.540788919 [b:DEBUG] read 207 bytes
2019-03-09T00:03:51.540798368 [b:DEBUG] parsed 6 headers (175 bytes)
2019-03-09T00:03:51.540806786 [b:DEBUG] incoming body is content-length (32 bytes)
2019-03-09T00:03:51.540818909 [b:DEBUG] loop process - 1 events, 49.478µs
2019-03-09T00:03:51.540826012 [b:DEBUG] loop poll - 749ns
2019-03-09T00:03:51.540833259 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 727587256 }
2019-03-09T00:03:51.540841703 [b:DEBUG] incoming body completed
2019-03-09T00:03:51.540859607 [b:INFO] The API server received a synchronous Put request on "/actions" with body "{\"action_type\":\"InstanceStart\"}\n".
2019-03-09T00:03:51.540866410 [b:DEBUG] loop process - 1 events, 40.396µs
2019-03-09T00:03:51.540876119 [b:DEBUG] loop poll - 665ns
2019-03-09T00:03:51.540874684 [b:INFO] VMM received instance start command
2019-03-09T00:03:51.540885520 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 727637365 }
2019-03-09T00:03:51.540895240 [b:DEBUG] loop process - 1 events, 19.121µs
2019-03-09T00:03:51.540905323 [b:DEBUG] loop poll - 628ns
2019-03-09T00:03:51.540913240 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 727666574 }
2019-03-09T00:03:51.540917888 [b:WARN] Disk size 1 is not a multiple of sector size 512; the remainder will not be visible to the guest.
2019-03-09T00:03:51.540922700 [b:DEBUG] loop process - 0 events, 17.386µs
2019-03-09T00:03:51.540943425 [b:INFO] Guest memory starts at 7f19d942d000
2019-03-09T00:03:51.549006781 [b:WARN] Could not add stdin event to epoll. Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }
2019-03-09T00:03:51.549707482 [b:DEBUG] loop poll - 8.774999ms
2019-03-09T00:03:51.549733463 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 736468269 }
2019-03-09T00:03:51.549755779 [b:INFO] The synchronous Put request on "/actions" with body "{\"action_type\":\"InstanceStart\"}\n" was executed successfully. Status code: 204 No Content.
2019-03-09T00:03:51.549792749 [b:DEBUG] flushed 64 bytes
2019-03-09T00:03:51.549812082 [b:DEBUG] scheduling direction for: 1
2019-03-09T00:03:51.549823680 [b:DEBUG] blocking
2019-03-09T00:03:51.549839000 [b:DEBUG] loop process - 1 events, 131.826µs
2019-03-09T00:03:51.549853697 [b:DEBUG] loop poll - 2.342µs
2019-03-09T00:03:51.549867039 [b:DEBUG] loop time - Instant { tv_sec: 1486145, tv_nsec: 736614881 }
2019-03-09T00:03:51.549877575 [b:DEBUG] loop process - 1 events, 23.92µs
time="2019-03-09T00:03:51Z" level=debug msg="HTTP/1.1 204 No Content\r\nContent-Length: 0\r\nDate: Sat, 09 Mar 2019 00:03:51 GMT\r\n\r\n\n"
time="2019-03-09T00:03:51Z" level=info msg="startInstance successful: [PUT /actions][204] createSyncActionNoContent "
time="2019-03-09T00:03:51Z" level=debug msg="Called NewMachine()"
time="2019-03-09T00:03:51Z" level=info msg="Called startVMM(), setting up a VMM on /tmp/build/TestMicroVMExecution.sock"
time="2019-03-09T00:03:51Z" level=debug msg="VMM started socket path is /tmp/build/TestMicroVMExecution.sock"
time="2019-03-09T00:03:51Z" level=debug msg="Setting up signal handler"
time="2019-03-09T00:03:51Z" level=debug msg="GET /machine-config HTTP/1.1\r\nHost: localhost\r\nUser-Agent: Go-http-client/1.1\r\nAccept: application/json\r\nAccept-Encoding: gzip\r\n\r\n\n"
time="2019-03-09T00:03:51Z" level=debug msg="GET /machine-config HTTP/1.1\r\nHost: localhost\r\nUser-Agent: Go-http-client/1.1\r\nAccept: application/json\r\nAccept-Encoding: gzip\r\n\r\n\n"
time="2019-03-09T00:03:51Z" level=warning msg="firecracker exited: exit status 148"
samuelkarp commented 5 years ago

I'm not sure if Docker has anything to do with it. I'm getting the same message and exit code on an m5.metal running Debian Stretch with musl (from Debian) and rust (from rustup).

installation details ``` admin@ip-172-31-33-228:~$ rustc --version rustc 1.33.0 (2aa4c46cf 2019-02-28) admin@ip-172-31-33-228:~$ dpkg -l | grep musl ii musl:amd64 1.1.16-3 amd64 standard C library ii musl-dev 1.1.16-3 amd64 standard C library development files ii musl-tools 1.1.16-3 amd64 standard C library tools ```
strace ``` execve("/usr/local/bin/firecracker", ["/usr/local/bin/firecracker", "--api-sock", "./firecracker.sock"], [/* 17 vars */]) = 0 mmap(NULL, 704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff438b90000 arch_prctl(ARCH_SET_FS, 0x7ff438b901c0) = 0 set_tid_address(0x7ff438b90200) = 58121 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x5c4cce}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0 rt_sigaction(SIGSEGV, {sa_handler=0x5768a0, sa_mask=[], sa_flags=SA_RESTORER|SA_STACK|SA_SIGINFO, sa_restorer=0x5c4cce}, NULL, 8) = 0 rt_sigaction(SIGBUS, {sa_handler=0x5768a0, sa_mask=[], sa_flags=SA_RESTORER|SA_STACK|SA_SIGINFO, sa_restorer=0x5c4cce}, NULL, 8) = 0 sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff438b8e000 sigaltstack({ss_sp=0x7ff438b8e000, ss_flags=0, ss_size=8192}, NULL) = 0 brk(NULL) = 0xf0e000 brk(0xf0f000) = 0xf0f000 rt_sigaction(SIGSYS, {sa_handler=0x457c80, sa_mask=~[RTMIN RT_1 RT_2], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x5c4cce}, NULL, 8) = 0 getrandom("\344\347\37\210\302S)\261\340\256\317;]r\267(", 16, GRND_NONBLOCK) = 16 brk(0xf10000) = 0xf10000 eventfd2(0, EFD_NONBLOCK) = 4 dup(4) = 5 rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0 mmap(NULL, 2105344, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff43898c000 mprotect(0x7ff43898d000, 2101248, PROT_READ|PROT_WRITE) = 0 clone(child_stack=0x7ff438b8d938, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|0x400000, parent_tidptr=0x7ff438b8db38, tls=0x7ff438b8db00, child_tidptr=0x7ff438b8db40) = 58122 epoll_create1(0) = 6 fcntl(6, F_GETFD) = 0 fcntl(6, F_SETFD, FD_CLOEXEC) = 0 pipe([10, 11]) = 0 fcntl(10, F_SETFL, O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 0 fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 0 epoll_ctl(6, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLET, {u32=4294967295, u64=18446744073709551615}}) = 0 brk(0xf14000) = 0xf14000 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 12 bind(12, {sa_family=AF_UNIX, sun_path="./firecracker.sock"}, 21) = 0 listen(12, 128) = 0 epoll_ctl(6, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2, u64=2}}) = 0 brk(0xf15000) = 0xf15000 brk(0xf16000) = 0xf16000 brk(0xf17000) = 0xf17000 brk(0xf18000) = 0xf18000 brk(0xf1a000) = 0xf1a000 prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) = 0 prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, {len=653, filter=0xf13a40}) = 0 epoll_pwait(6, [], 1024, 0, NULL, 8) = 0 epoll_pwait(6, [], 1024, 0, NULL, 8) = 0 epoll_pwait(6, [{EPOLLIN, {u32=2, u64=2}}], 1024, -1, NULL, 8) = 1 epoll_pwait(6, [], 1024, 0, NULL, 8) = 0 accept4(12, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, [110], SOCK_CLOEXEC) = 288 --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x5c46d9, si_syscall=__NR_accept4, si_arch=AUDIT_ARCH_X86_64} --- brk(0xf1d000) = 0xf1d000 open("/etc/localtime", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 20 fstat(20, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 mmap(NULL, 127, PROT_READ, MAP_SHARED, 20, 0) = 0x7ff438989000 close(20) = 0 write(2, "2019-03-09T00:51:19.682408203 [a"..., 1422019-03-09T00:51:19.682408203 [anonymous-instance:ERROR:vmm/src/sigsys_handler.rs:70] Shutting down VM after intercepting a bad syscall (288).) = 142 write(2, "\n", 1 ) = 1 write(2, "2019-03-09T00:51:19.682695832 [a"..., 1512019-03-09T00:51:19.682695832 [anonymous-instance:ERROR:vmm/src/sigsys_handler.rs:76] Failed to log metrics while stopping: Logger was not initialized.) = 151 write(2, "\n", 1 ) = 1 exit_group(148) = ? +++ exited with 148 +++ ```

The disallowed call appears to be an accept4 on the socket file (fd 12, see the call to socket() = 12 and bind(12, {sa_family=AF_UNIX, sun_path="./firecracker.sock"}, 21) earlier in the strace output).

samuelkarp commented 5 years ago

This looks to be a change from Rust 1.32 to Rust 1.33. Building with Rust 1.32 appears to work.

xibz commented 5 years ago

@samuelkarp - Updating to 1.32 worked for me as well. I'll go ahead and update the title to better reflect this being a Rust v1.33.0 issue.

andreeaflorescu commented 5 years ago

We haven't updated yet to Rust 1.33.0. The supported Rust version is the one from the fcuvm container(1.32.0). We will fix the seccomp filters once we update to 1.33.0. Do you have any constraints for which you must build Firecracker with 1.33.0?

xibz commented 5 years ago

@andreeaflorescu - Naw, I don't, as in where I need to build with 1.33.0. rustup just automatically installs 1.33.0, which led to some confusion to why things were suddenly failing.