NeuronRobotics / nrjavaserial

A Java Serial Port system. This is a fork of the RXTX project that uses in jar loading of the native code.
Other
345 stars 143 forks source link

Fix segfault if file descriptor unavailable #249

Closed claui closed 7 months ago

claui commented 8 months ago

The get_java_var_long function returns 0 in several failure modes, e.g. if a file descriptor is unavailable.

However, one of the call sites is missing the result check, which causes a JVM segfault if the return value is 0. The segfault occurs on dereferencing the pointer:

eis->eventflags[SPE_DATA_AVAILABLE]

Add a result value check, throwing a proper IOException if it is 0.

See also similar issue #59.

Fixes #112, #136 and #242.

claui commented 7 months ago

I can confirm that this fix is independent of PR #211. Our tests on our hardware suggest we need both this PR and #211 to keep the JVM from segfaulting.

Both segfaults are easy to tell apart because they have unique fingerprints in the error log:

The segfault being tackled here always occurs in read_byte_array. It always has 0x0000000000000008 as the unmapped address being dereferenced. (The number 8 comes from the struct offset SPE_DATA_AVAILABLE).

Segfaults being addressed by PR #211 are occurring outside of read_byte_array, e.g. in interruptEventLoop.
The code in interruptEventLoop is interpreting stack garbage as an event_info_struct, causing a crash trying to dereference the next "pointer". The unmapped address mentioned in the crash log is not the number 8, but random stack garbage (e.g. 0x0000000d00000000).

MrDOS commented 7 months ago

Thank you for digging so deep into these issues. The check you're introducing here is perfectly sensible. What initially confused me was how this code is getting called at all with RXTXPort.eis == NULL. The only place I can find where eis is reset to 0 is if the event loop returns without throwing anything:

https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/java/gnu/io/RXTXPort.java#L108-L111

It looks like that happens when the event loop is shut down via RXTXPort.removeEventListener() (which is one of the things done by RXTXPort.close()):

https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/java/gnu/io/RXTXPort.java#L914

https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/c/src/SerialImp.c#L4984

https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/c/src/SerialImp.c#L4261-L4268

The read_byte_array() function you're fixing here is called under the hood by all the different RXTXPort.read() methods. All of those are careful to check that the monitor thread hasn't been interrupted before issuing the read. They're careful to acquire a lock on the I/O mutex. They're even all synchronized.

https://github.com/NeuronRobotics/nrjavaserial/blob/0df8b60485a56d7698b71183237b5615d02a8194/src/main/java/gnu/io/RXTXPort.java#L1482-L1492

...but because access to the monThreadisInterrupted property isn't synchronized (either in read() or in removeEventListener()), there's still a race condition whereby the monitor thread can pack up and go home in between the check and the native read code call :man_facepalming: It's a very narrow one, sure, but given that many (most?) applications will call read() in a tight loop, it's no surprise that it gets hit eventually.

I'll happily merge this fix now – thank you very much for the contribution. And I'll modify https://github.com/NeuronRobotics/nrjavaserial/pull/211 to protect the monThreadisInterrupted checks inside the read() methods with a read lock on monitorThreadState.