erickt / rust-zmq

Rust zeromq bindings.
Apache License 2.0
900 stars 194 forks source link

[OS X, v0.8] Crash when zmq::Socket::recv_bytes() is interrupted #184

Closed jjpe closed 6 years ago

jjpe commented 7 years ago

When my code (which uses rust-zmq and currently runs on OS X) is interrupted, the whole program panics. I can verify that according to libc the error code 4 equals EINTR, see here for details.

And yet the error code 4 is not recognized by zmq::Error::from_raw, leading to a panic.

I would like to discuss 2 changes:

The backtrace:

thread '<unnamed>' panicked at 'unknown error [4]: Interrupted system call', /Users/j/.cargo/registry/src/github.com-1ecc6299db9ec823/zmq-0.8.1/src/lib.rs:307
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
   1: std::panicking::default_hook::{{closure}}
   2: std::panicking::default_hook
   3: std::panicking::rust_panic_with_hook
   4: std::panicking::begin_panic
   5: std::panicking::begin_panic_fmt
   6: zmq::Error::from_raw
   7: zmq::errno_to_error
   8: zmq::Socket::recv
   9: zmq::Socket::recv_msg
  10: zmq::Socket::recv_bytes
  11: libcereal::component::CComponent::receive
  12: libcereal::amplify::client::CClient::receive
  13: libcereal_module::Fcclient_receive
  14: Finternal_module_call
  15: Ffuncall
  16: Fapply
  17: eval_sub
  18: funcall_lambda
  19: apply_lambda
  20: eval_sub
  21: Fif
  22: eval_sub
  23: FletX
  24: eval_sub
  25: Feval
  26: Ffuncall
  27: exec_byte_code
  28: Ffuncall
  29: exec_byte_code
  30: Ffuncall
  31: Ffuncall_interactively
  32: Ffuncall
  33: Fcall_interactively
  34: Ffuncall
  35: exec_byte_code
  36: funcall_lambda
  37: Ffuncall
  38: Fapply
  39: Ffuncall
  40: exec_byte_code
  41: Ffuncall
  42: exec_byte_code
  43: Ffuncall
  44: call1
  45: command_loop_1
  46: Fexecute_kbd_macro
  47: Fcall_last_kbd_macro
  48: Ffuncall
  49: exec_byte_code
  50: funcall_lambda
  51: apply_lambda
  52: eval_sub
  53: funcall_lambda
  54: Ffuncall
  55: Ffuncall_interactively
  56: Ffuncall
  57: Fapply
  58: Fcall_interactively
  59: Ffuncall
  60: exec_byte_code
  61: funcall_lambda
  62: Ffuncall
  63: Fapply
  64: Ffuncall
  65: exec_byte_code
  66: Ffuncall
  67: exec_byte_code
  68: Ffuncall
  69: call1
  70: command_loop_1
  71: internal_condition_case
  72: command_loop_2
  73: internal_catch
  74: command_loop
  75: recursive_edit_1
  76: Frecursive_edit
  77: main
fatal runtime error: failed to initiate panic, error 5

UPDATE: To deepen the mystery, I have just cloned the repo, checked out the release/v0.8 branch, and changed the from_raw definition to:

pub fn from_raw(raw: i32) -> Error {
        #![cfg_attr(feature = "clippy", allow(match_same_arms))]
        match raw {
            4 => Error::EINTR,
            errno::EACCES             => Error::EACCES,
            errno::EADDRINUSE         => Error::EADDRINUSE,
            errno::EAGAIN             => Error::EAGAIN,
            errno::EBUSY              => Error::EBUSY,
            errno::ECONNREFUSED       => Error::ECONNREFUSED,
            errno::EFAULT             => Error::EFAULT,
            errno::EHOSTUNREACH       => Error::EHOSTUNREACH,
            errno::EINPROGRESS        => Error::EINPROGRESS,
            errno::EINVAL             => Error::EINVAL,
            errno::EMFILE             => Error::EMFILE,
            errno::EMSGSIZE           => Error::EMSGSIZE,
            errno::ENAMETOOLONG       => Error::ENAMETOOLONG,
            errno::ENODEV             => Error::ENODEV,
            errno::ENOENT             => Error::ENOENT,
            errno::ENOMEM             => Error::ENOMEM,
            errno::ENOTCONN           => Error::ENOTCONN,
            errno::ENOTSOCK           => Error::ENOTSOCK,
            errno::EPROTO             => Error::EPROTO,
            errno::EPROTONOSUPPORT    => Error::EPROTONOSUPPORT,
            errno::ENOTSUP            => Error::ENOTSUP,
            errno::ENOBUFS            => Error::ENOBUFS,
            errno::ENETDOWN           => Error::ENETDOWN,
            errno::EADDRNOTAVAIL      => Error::EADDRNOTAVAIL,
            errno::EINTR              => Error::EINTR,
            156384714                => Error::EPROTONOSUPPORT,
            156384715                => Error::ENOBUFS,
            156384716                => Error::ENETDOWN,
            156384717                => Error::EADDRINUSE,
            156384718                => Error::EADDRNOTAVAIL,
            156384719                => Error::ECONNREFUSED,
            156384720                => Error::EINPROGRESS,
            156384721                => Error::ENOTSOCK,
            156384763                => Error::EFSM,
            156384764                => Error::ENOCOMPATPROTO,
            156384765                => Error::ETERM,
            156384766                => Error::EMTHREAD,

            x => {
                unsafe {
                    let s = zmq_sys::zmq_strerror(x);
                    panic!("unknown error [{}]: {}",
                        x,
                        str::from_utf8(ffi::CStr::from_ptr(s).to_bytes()).unwrap()
                    )
                }
            }
        }
    }

What changed here is that I added the 4 => Error::EINTR, line. The result: A compiler warning:

warning: unreachable pattern
   --> src/lib.rs:294:13
    |
294 |             errno::EINTR              => Error::EINTR,
    |             ^^^^^^^^^^^^
    |
    = note: #[warn(unreachable_patterns)] on by default

But what is also noteworthy is that my program no longer crashes. So somehow rustc sees 4 as equivalent to errno::EINTR. Yet at runtime on OS X that does not seem to hold, otherwise the system would just pick errno::EINTR rather than the default case.

rotty commented 7 years ago

I'm reluctant to adding a Unknown variant to zmq::Error, as you suggest, as that would penalize all uses thereof, turning it from a C-like enum to a full-blown sum type. The default, panic case should not be reached, as you note yourself -- we should get to the bottom of why the match apparently misbehaves on OS X.

jjpe commented 7 years ago

Ok I'm all for fixing this issue at its root. Do you perhaps have any ideas on where we can start? Because I'm not sure what exactly even goes wrong to be honest i.e. why it looks like on the one hand rustc sees EINTR and 4 as equal, yet on the other hand the system does not.

jjpe commented 7 years ago

There hasn't been a lot of progress here, but at the same time my project is not too far away from a 1.0.0 moment, at which point I'd like to push the rust components to crates.io.

In order to do that however, all the dependencies of my crates also need to live on crates.io, the Rust deployment mechanism in Cargo verifies this.

Which poses a conundrum: How can I release while this issue (which has effectively caused me to fork this project and thus introduce an unstable dependency as far as crates.io is concerned) is still unresolved?

The only 2 things I can think of are:

rotty commented 7 years ago

@jjpe: Issue #191 looks similiar, and is apparently fixed by commit 752765b2, contained in the (very recent) 0.8.2 release. Based on the code posted in that issue, I've constructed the following minimal test case:

extern crate zmq;
extern crate libc;

extern "C" fn s_signal_handler(_: i32) {
}

fn main() {
    let ctx = zmq::Context::new();

    unsafe {
        libc::signal(2, s_signal_handler as libc::size_t);
    }

    let pull_socket = ctx.socket(zmq::PULL).unwrap();
    let _ = pull_socket.recv_msg(0).unwrap();
}

If I start the above program from the command line on a Debian amd64 box, and hit Ctrl+C, I get as expected:

^Cthread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Interrupted system call', /checkout/src/libcore/result.rs:860:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Can you verify that the above sample code is still broken using v0.8.2?

jjpe commented 6 years ago

Just tested against 0.8.2.

As far as I can see, it all works properly now, both in your snippet and in my own project. Thanks for the fix!

gibsond commented 5 years ago

I get this problem using v0.9, but only when I build a release: cargo build --release:

thread 'main' panicked at 'called Result::unwrap() on an Err value: Interrupted system call', src/libcore/result.rs:1009:5 stack backtrace: 0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49 1: std::panicking::default_hook::{{closure}} at src/libstd/sys_common/backtrace.rs:71 at src/libstd/sys_common/backtrace.rs:59 at src/libstd/panicking.rs:211 2: std::panicking::rust_panic_with_hook at src/libstd/panicking.rs:227 at src/libstd/panicking.rs:491 3: std::panicking::continue_panic_fmt at src/libstd/panicking.rs:398 4: rust_begin_unwind at src/libstd/panicking.rs:325 5: core::panicking::panic_fmt at src/libcore/panicking.rs:95 6: core::result::unwrap_failed 7: es1800_zmqsubr::main 8: std::rt::lang_start::{{closure}} 9: main 10: __libc_start_main 11: _start Aborted (core dumped)

It works fine with just cargo run, but I need release code.

Any thoughts?

gibsond commented 5 years ago

I apologize for the above message. I don't understand why it worked for cargo build, but not for cargo build --release, however I 'fixed' the problem by changing:

let stuff = socket.recv_multipart(0).unwrap();

to:

let mut stuff: Vec<Vec>; match socket.recv_multipart(0) { Ok(msg) => { stuff = msg; }, Err(error) => { println!("zmq receive error: {:?}", error); stuff = vec![vec![24, 25, 26]]}, };

where I properly catch the ctrl-c error zmq encounters and do not panic, but just print a message and allow the ctrl-c handler catch the ctrl-c as I wanted.

elichai commented 3 years ago

@rotty I can confirm this still happens to me in 0.9, Spec:

MacBook Pro (16-inch, 2019)
MacOS Big Sur 11.4 (20F71)
Rust 1.52.1
zmq 0.9

Code:

extern "C" {
    fn signal(_: i32, _: usize) -> usize;
}

extern "C" fn s_signal_handler(_: i32) {}

fn main() {
    let ctx = zmq::Context::new();

    unsafe {
        signal(2, s_signal_handler as usize);
    }

    let pull_socket = ctx.socket(zmq::PULL).unwrap();
    let _ = pull_socket.recv_msg(0).unwrap();
}

I'm then signaling ctrl+C and this is the output:

^Cthread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Interrupted system call', src/main.rs:15:37
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace