erickt / rust-zmq

Rust zeromq bindings.
Apache License 2.0
886 stars 189 forks source link

Context creation fails on alpine linux #337

Closed mbuczko closed 1 week ago

mbuczko commented 2 years ago

Hey, I'm trying to run as simple code as possible in alpine:latest container:

use tmq::Context;

fn main() {
    Context::new();
}

Cargo.toml:

[package]
name = "sample"
version = "0.1.0"
edition = "2018"

[dependencies]
zmq = "0.9"

and it painfully dies with core dump:

~/sample # ./target/debug/sample
Segmentation fault (core dumped)

There is something weird happening during context creation which leads to the core dump. I tried to strace it a bit, but still no idea what's the root cause:

~ # strace ./target/debug/sample
execve("./target/debug/sample", ["./target/debug/sample"], 0x7ffc5ff82360 /* 7 vars */) = 0
mmap(NULL, 368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd152c5d000
arch_prctl(ARCH_SET_FS, 0x7fd152c5d090) = 0
set_tid_address(0x7fd152cd0da0)         = 345
poll([{fd=0, events=0}, {fd=1, events=0}, {fd=2, events=0}], 3, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd152caaa00}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGSEGV, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=0x7fd152c83240, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_SIGINFO, sa_restorer=0x7fd152caaa00}, NULL, 8) = 0
rt_sigaction(SIGBUS, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=0x7fd152c83240, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_SIGINFO, sa_restorer=0x7fd152caaa00}, NULL, 8) = 0
sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fd152c5a000
mprotect(0x7fd152c5a000, 4096, PROT_NONE) = 0
sigaltstack({ss_sp=0x7fd152c5b000, ss_flags=0, ss_size=8192}, NULL) = 0
brk(NULL)                               = 0x5555571be000
brk(0x5555571bf000)                     = 0x5555571bf000
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fd152caaa00}, NULL, 8) = 0
rt_sigreturn({mask=[]})                 = 140537014043504
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Rustc details:

~ # rustc --version
rustc 1.59.0-nightly (78fd0f633 2021-12-29)

(fails with alpine packaged rust too). libzmq installed from alpine package (apk add libzmq), though I tried with self compiled version too - same result.

Any hint what might be the root cause of this problem?

Earthson commented 2 years ago

same problem:(

Earthson commented 2 years ago
Breakpoint 2, ztest::main () at src/main.rs:5
5           let ctx = zmq::Context::new();
(gdb) step
zmq::Context::new () at /usr/local/cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/zmq-0.9.2/src/lib.rs:434
434                     ctx: unsafe { zmq_sys::zmq_ctx_new() },
(gdb) step

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
daniel-brenot-apcapital commented 2 years ago

Same problem here. Any interest in solving this problem? This is a huge issue for me right now.

daniel-brenot-apcapital commented 2 years ago

Additionally, this seems to happen on any docker image, not just alpine. Tried this on ubuntu as well with no luck

daniel-brenot-apcapital commented 1 year ago

@Jasper-Bekkers Any clue here? I'm trying to use this in production but I can't release it until this is fixed. This one's a pretty huge deal. Any help is appreciated.

Jasper-Bekkers commented 1 year ago

This looks a bit like it's happening inside zmq directly, the current release uses a very old release of zmq to bind against.

https://github.com/erickt/rust-zmq/pull/345 is the path forward - it upgrades some things (as well as the zmq library), so potentially it has some fixes that apply to you as well.

My first instinct was some kind of ffi error but since there are no parameters to Context::new I don't think this is the case, so you'll have to step into zmq_ctx_new to really debug what's going on with your access violation. Another guess would be a libc mixup if alpine is on musl potentially?

daniel-brenot-apcapital commented 1 year ago

I tried the zmq2 library and I get the same error with it, so I don't think it has to do with the zmq version.

Jasper-Bekkers commented 1 year ago

Too bad! In that case I suspect you'll need to debug this a bit yourself, I'm happy to review a fix if you have one.

daniel-brenot-apcapital commented 1 year ago

Alright, well i'm posting a bounty to bountysource then since this is important to me but i'm not quite sure how to solve it.

abcpro1 commented 1 year ago

Hi. I would like to chase this bug and solve it ASAP. Can I reproduce it in docker?

I couldn't reproduce it myself.

daniel-brenot-apcapital commented 1 year ago

It should be easy to reproduce in docker by just building and then running it. It won't happen at build time, it happens at runtime.

Edit: Feel free to email me at my email posted on my account and i can see about setting up a demo

daniel-brenot-apcapital commented 1 year ago

I have created an example for this issue: https://github.com/daniel-brenot-apcapital/example-zmq

abcpro1 commented 1 year ago

It seems that the dynamic linker is not operating correctly, and this is an issue with the rust compiler installed by rustup, it is not an issue with the rust compiler packaged by alpine linux.

Some technical details about the segfault:

This where the segfault is happening:

(gdb) disas
Dump of assembler code for function _ZnwmRKSt9nothrow_t@plt:
=> 0x00007ffff7ee3050 <+0>:     jmp    QWORD PTR [rip+0x1184fa]        # 0x7ffff7ffb550 <_ZnwmRKSt9nothrow_t@got.plt>
   0x00007ffff7ee3056 <+6>:     push   0x3
   0x00007ffff7ee305b <+11>:    jmp    0x7ffff7ee3010

(gdb) x/a 0x7ffff7ffb550
0x7ffff7ffb550 <_ZnwmRKSt9nothrow_t@got.plt>:   0x1c056

(gdb) i sharedlibrary
No shared libraries loaded at this time.

The segfault happens in the PLT stub for operator new which is a dynamically linked function from libstdc++. The linker should have resolved this function and replaced the placeholder address 0x1c056 with the actual address of operator new from libstdc++. Also, gdb says No shared libraries loaded at this time. which suggests that the dynamic linker was not able to load any shared libraries.

In contrast, this is the same output from a correct binary (which I will explain how to build):

(gdb) disas
Dump of assembler code for function _ZnwmRKSt9nothrow_t@plt:
=> 0x000055555556e070 <+0>:     jmp    *0xf52ca(%rip)        # 0x555555663340 <_ZnwmRKSt9nothrow_t@got.plt>
   0x000055555556e076 <+6>:     push   $0x5
   0x000055555556e07b <+11>:    jmp    0x55555556e010

(gdb) x/a 0x555555663340
0x555555663340 <_ZnwmRKSt9nothrow_t@got.plt>:   0x7ffff7e4f6ed <_ZnwmRKSt9nothrow_t>

(gdb) i sharedlibrary
From                To                  Syms Read   Shared Object Library
0x00007ffff7f7c070  0x00007ffff7fc3877  Yes (*)     /lib/ld-musl-x86_64.so.1
0x00007ffff7e4bae0  0x00007ffff7ee54a1  Yes (*)     /usr/lib/libstdc++.so.6

Workaround:

If this problem is blocking you in production, use these instructions to to solve it know: Install and use the rust compiler provided by the rust package from the official alpine linux repository and do not use rustup.

This is a modified dockerfile based on @daniel-brenot-apcapital's test sample which will not produce a segfault

FROM alpine AS builder
RUN apk add g++ rust cargo
ENV CARGO_HOME=/usr/local/cargo
WORKDIR /home/user/src
COPY . .
RUN CXXFLAGS=-DZMQ_HAVE_STRLCPY cargo install --path .

FROM alpine AS deploy
RUN apk add libstdc++
COPY --from=builder /usr/local/cargo/bin/ .

This is a similar dockerfile which will produce a segfault. The only difference is that the rust compiler is installed by rustup

FROM alpine AS builder
RUN apk add g++ rustup
ENV CARGO_HOME=/usr/local/cargo \
    PATH=/usr/local/cargo/bin:$PATH
RUN rustup-init -y --no-modify-path --profile minimal
WORKDIR /home/user/src
COPY . .
RUN CXXFLAGS=-DZMQ_HAVE_STRLCPY cargo install --path .

FROM alpine AS deploy
RUN apk add libstdc++
COPY --from=builder /usr/local/cargo/bin/ .

I will look further to see why the dynamic linker does not work when rustc is installed by rustup.

abcpro1 commented 1 year ago

The reason why the compiler packaged by alpine works is that it does not statically link by default, while vanilla rust does statically link by default for musl targets.

The issue with static linking is the fact that cc which compiles the zeromq C++ code links libstdc++ dynamically, which is a reasonable choice because the cross compilation toolchain provided by rustup does not have a cross compiled libstdc++. But if you really want to statically link libstdc++ you can setup .cargo/config where you tell cargo where to look for a cross compiled libstdc++.a for your target.

The solution to this issue is to create a cargo config file .cargo/config in your package directory with the following content:

[target.x86_64-unknown-linux-musl]
rustflags = ["-C", "target-feature=-crt-static"]

This fixes the segfault induced by dynamic linking.

Otherwise if you need to use a static libstdc++ you can change the content of the cargo config to this:

[target.x86_64-unknown-linux-musl]
rustflags = ["-L", "native=/usr/lib", "-l", "static=stdc++"]

This will statically link libstdc++ which should be located in /usr/lib. Note that this static library must be (cross) compiled for your target. If you are building on an alpine host then /usr/lib/libstdc++.a is OK.

daniel-brenot-apcapital commented 1 year ago

This is great. It's a holiday where i'm from right now, but on Monday i'll create a PR with an update to the readme to let people who run into this problem know about this solution, or you may do so yourself. Once that's done i'm sure the issue can be closed and the bounty will be sent. Thank you for all your hard work!

daniel-brenot commented 1 week ago

Can we close this so this guy gets the bounty? I don't have that email account anymore and this guy solved the problem for me back then

Jasper-Bekkers commented 1 week ago

Sure thing!