eclipse-iceoryx / iceoryx2

Eclipse iceoryx2™ - true zero-copy inter-process-communication in pure Rust
https://iceoryx.io
Apache License 2.0
509 stars 23 forks source link

benchmark-publish subscripe panic #81

Closed fengtuo58 closed 1 month ago

fengtuo58 commented 8 months ago

Bug report after analysis

bindgen-0.69.2 is unable to handle macro redefines, see: https://github.com/rust-lang/rust-bindgen/issues/2722

The signal SIGSTOP and SIGCONT are redefined under ubuntu 20.04 in the included file bits/signum.h of signal.h. The redefinition turns SIGCONT into the unfetchable signal SIGSTOP which leads to a failure in the sigaction call.

The quick fix is to comment out the Continue signal in the abstraction, since it is also nowhere used. When the bindgen bug report is closed, we can revert the change and close the issue.

  1. [x] comment out Continue signal in signal.rs
  2. [ ] comment in Continue signal after bindgen bugfix

Original bug report from @fengtuo58

This should never happen! Unable to register raw sional since sigaction was called with invalid parameters. thread " iceorvx2-bb/posix/src/signal.rs:516:13: panicked at rom: SignalHandler ( registered signals: 11 ) :.: This should never happen! Unable to register raw signal since sigaction was called with inve lid parameters note: run with RUST BACKTRACE=1' environment variable to display a backtrace 12 [F] SianalHandler f registered sianals: [7 This should never happen! nable to register raw sional since sigaction was called with invalid parameters thread ' ' panicked at iceoryx2-bb/posix/src/sianal,rs:516:13: rom: SianalHandler ! registered sianals: : This should never happen: Unable to register raw signal since sigaction was called with inva lid parameters . stack backtrace:

fengtuo58 commented 8 months ago

anyone know what reason?

elfenpiff commented 8 months ago

@fengtuo58 what OS and Rust Version are you using?

fengtuo58 commented 8 months ago

Linux tegra-ubuntu 5.15.98-rt-tegra #1 SMP PREEMPT RT Fri Jan 12 16:33:29 CST 2024 aarch64 aarch64 aarch64 GNU/Linux @elfenpiff

elfenpiff commented 8 months ago

@fengtuo58

The bug is fixed and merged to main . Could you please confirm that it is fixed by checking out main and rerun the benchmark?

elBoberido commented 8 months ago

@fengtuo58 btw, I noticed you are related to BYD and am wondering whether BYD would be interested in accelerating the development of iceoryx2

passchaos commented 8 months ago

The problem still exists, platform info: Linux tegra 5.10.120-tegra #2 SMP PREEMPT Thu Jan 4 16:55:19 CST 2024 aarch64 aarch64 aarch64 GNU/Linux image

elBoberido commented 8 months ago

@fengtuo58 do you have more output? It is cut off right before it looked to become interesting

elfenpiff commented 8 months ago

@elBoberido

I cannot reproduce it locally, but it seems to occur when the shared memory is zeroed. We zero it to ensure that the target has really enough memory available. If this fails SIGABRT or another signal is raised which we fetch to provide a detailed error message.

I suspect that linux tegra maybe has a different subset of fetchable signals than other linux distributions. I added a more detailed error output to identify this signal which is registered via sigaction.

@passchaos @fengtuo58

Could you please retry the benchmark from current main and provide the log full log output? Hopefully, this identifies the signal and helps to fix this issue. Thank you in advance for supporting us here!

passchaos commented 8 months ago

@elBoberido @elfenpiff I tried the latest main branch code and attached the crash log. crash.log

passchaos commented 8 months ago

In fact, Ubuntu 20.04 x86_64 also had this problem and did not try other versions.

passchaos commented 8 months ago

@elBoberido @elfenpiff I have found the reason for this problem. Under ubuntu,SIGCONT actually uses the value 18, but the value in the automatically generated bindings is 19, which actually corresponds to SIGSTOP, but this signal cannot use sigaction function

image

elBoberido commented 8 months ago

@passchaos whoa, can you share something about your setup? bindgen should actually take care of this. This means it is either a bug in bindgen or you have a quite exotic setup.

elfenpiff commented 8 months ago

@passchaos I can confirm the bug with ubuntu 20.04, and I think this is a bug in the underlying bindgen since the maintainer did underestimate what "interesting" things you can do with C.

So what is the problem, I go over what the C pre-processor is doing.

  1. Some process includes /usr/include/signal.h - in this case our platform with the signal abstraction
  2. In this header we include /usr/include/x86_64-linux-gnu/bits/signum.h 2.1 Here we include /usr/include/x86_64-linux-gnu/bits/signum-generic.h, this defines all signals als C macros 2.2 10 lines below we undef all signals again and redefine them

In step 2.2 bindgen seems to lose it. It keeps the old macro values and this causes a screw up. But I have no idea why in the first place the value of SIGCONT and SIGSTOP are swapped, but they are.

So somehow we have to make bindgen aware that here are potentially macros defined, undefined, defined again.

elfenpiff commented 8 months ago

@elBoberido @passchaos

So the bug is in bindgen-0.65.1 (and current 0.69.2). The following C code is not translated correctly.

#define A 1
#define B 2

#undef A
#define A 2
#undef B
#define B 1

The result should that A == 2 and B == 1 but the generated file shows

/* automatically generated by rust-bindgen 0.65.1 */

pub const A: u32 = 1;
pub const B: u32 = 2;
elfenpiff commented 8 months ago

Here is the bindgen bugreport: https://github.com/rust-lang/rust-bindgen/issues/2722

elfenpiff commented 8 months ago

@passchaos @fengtuo58 The bug is fixed and iceoryx2 0.2.2 is released.

elBoberido commented 7 months ago

@passchaos @fengtuo58 would be great if you could confirm that it is also fixed for you and then close the issue :)

passchaos commented 7 months ago

@elBoberido I confirm that the issue are fixed, both for tegra and ubuntu.

elBoberido commented 7 months ago

Oh, totally forgot that just a workaround was merged and we need to wait for a proper fix in bindgen.

fengtuo58 commented 7 months ago

It is not work at this commit commit 891ae9bc4e26c95c0e067140d93240bba4c6ca86 (origin/main, origin/HEAD)

./benchmark-publish-subscribe 11 [F] SignalHandler { registered_signals: [] } | This should never happen! Unable to register raw signal since sigaction was called with invalid parameters. thread '' panicked at iceoryx2-bb/posix/src/signal.rs:515:13: From: SignalHandler { registered_signals: [] } ::: This should never happen! Unable to register raw signal since sigaction was called with invalid parameters. note: run with RUST_BACKTRACE=1 environment variable to display a backtrace 12 [F] SignalHandler { registered_signals: [] } | This should never happen! Unable to register raw signal since sigaction was called with invalid parameters. thread '' panicked at iceoryx2-bb/posix/src/signal.rs:515:13: From: SignalHandler { registered_signals: [] } ::: This should never happen! Unable to register raw signal since sigaction was called with invalid parameters.

fengtuo58 commented 7 months ago

It is not work at this commit commit 891ae9bc4e26c95c0e067140d93240bba4c6ca86 (origin/main, origin/HEAD)

./benchmark-publish-subscribe 11 [F] SignalHandler { registered_signals: [] } | This should never happen! Unable to register raw signal since sigaction was called with invalid parameters. thread '' panicked at iceoryx2-bb/posix/src/signal.rs:515:13: From: SignalHandler { registered_signals: [] } ::: This should never happen! Unable to register raw signal since sigaction was called with invalid parameters. note: run with RUST_BACKTRACE=1 environment variable to display a backtrace 12 [F] SignalHandler { registered_signals: [] } | This should never happen! Unable to register raw signal since sigaction was called with invalid parameters. thread '' panicked at iceoryx2-bb/posix/src/signal.rs:515:13: From: SignalHandler { registered_signals: [] } ::: This should never happen! Unable to register raw signal since sigaction was called with invalid parameters.

elBoberido commented 7 months ago

@fengtuo58 the output you posted cannot be from the current main branch. This is the current string

"This should never happen! posix::sigaction returned {}. Unable to register raw signal since sigaction was called with invalid parameters: {:?} which lead to error: {:?}."

So there must be posix::sigaction returned and which lead to error: in the error string. Can you please check that the you have all the updates?