NeuronRobotics / nrjavaserial

A Java Serial Port system. This is a fork of the RXTX project that uses in jar loading of the native code.
Other
344 stars 142 forks source link

Buffer Overflows in Native Serial Implementation with Long Device Names #229

Closed david-pace closed 1 year ago

david-pace commented 1 year ago

I am currently troubeshooting an issue that occurs in nrjavaserial-5.2.1 when connecting to a certain serial port on an openHAB system. For reference, there is also a thread in the openHAB community about this, but so far no one had an idea about how it could be fixed. So I hoped maybe you guys could help me identify and fix the issue.

Platform information: Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-122-generic x86_64)

Background: I use two different serial devices connected via USB. My system sometimes changes the assignments of the USB ports to tty devices, i.e. sometimes a certain device will be mounted at /dev/ttyUSB0 and sometimes at /dev/ttyUSB1. The assignments can change upon every reboot and the behavior seems to be non-deterministic. Luckily there are some symlinks with stable names located at /dev/serial/by-id:

grafik

I am trying to use those symlinks for my openHAB devices. The issue is that one of those symlinks can be used without any problems, while the other crashes the complete JVM when connected to.

This one works without any issues:

/dev/serial/by-id/usb-SHK_NANO_CUL_433-if00-port0

while this one crashes the JVM:

 /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0

I debugged the issue remotely and found out that the crash happens when calling the native open() method in class gnu.io.RXTXPort. The call is at line 173:

fd = open( name );

the native method called is:

private native synchronized int open( String name ) throws PortInUseException;

First I suspected a threading issue, but even when deactivating the NanoCUL device the issue still occurs, so concurrency does not seem to be the problem.

Do you guys have any other idea why this could happen? The only reason I can think of right now is the length of the device names. The NanoCUL device name has 50 characters, while the other one (it is a Zigbee USB stick) has 66 characters. Are you aware of any limitations regarding the device name length? Or do you have any other ideas why the crash might occur?

Here are the two stack traces of the calls:

From the openHAB Serial Binding with device name /dev/serial/by-id/usb-SHK_NANO_CUL_433-if00-port0 (this call works):

Thread [OH-safeCall-1] (Suspended (breakpoint at line 173 in RXTXPort)) 
    RXTXPort.<init>(String) line: 173   
    RXTXCommDriver.getCommPort(String, int) line: 983   
    CommPortIdentifier.open(String, int) line: 467  
    SerialPortIdentifierImpl.open(String, int) line: 53 
    SerialBridgeHandler.initialize() line: 138  
    GeneratedMethodAccessor91.invoke(Object, Object[]) line: not available  
    DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
    Method.invoke(Object, Object...) line: 566  
    InvocationHandlerSync<T>(AbstractInvocationHandler<T>).invokeDirect(Invocation) line: 154   
    Invocation.call() line: 52  
    FutureTask<V>.run() line: 264   
    QueueingThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1128  
    ThreadPoolExecutor$Worker.run() line: 628   
    Thread.run() line: 829

From the Zigbee binding with device name /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0 (this one crashes the JVM):

Daemon Thread [OH-thingHandler-5] (Suspended (breakpoint at line 173 in RXTXPort))  
    owns: EmberHandler  (id=705)    
    RXTXPort.<init>(String) line: 173   
    RXTXCommDriver.getCommPort(String, int) line: 983   
    CommPortIdentifier.open(String, int) line: 467  
    SerialPortIdentifierImpl.open(String, int) line: 53 
    ZigBeeSerialPort.open(int, ZigBeePort$FlowControl) line: 157    
    ZigBeeSerialPort.open() line: 123   
    ZigBeeDongleEzsp.initialiseEzspProtocol() line: 1289    
    ZigBeeDongleEzsp.initialize() line: 432 
    ZigBeeNetworkManager.initialize() line: 418 
    EmberHandler(ZigBeeCoordinatorHandler).initialiseZigBee() line: 431 
    EmberHandler(ZigBeeCoordinatorHandler).lambda$2() line: 557 
    354614459.run() line: not available 
    Executors$RunnableAdapter<T>.call() line: 515   
    ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(FutureTask<V>).run() line: 264   
    ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 304  
    WrappedScheduledExecutorService(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1128 
    ThreadPoolExecutor$Worker.run() line: 628   
    Thread.run() line: 829

I could not find any JVM crash logs or core dumps. Please let me know if I can provide any further information.

I would be happy to contribute a fix if the issue turns out to be in the Java code (as opposed to the native code).

fwolter commented 1 year ago

There are indeed several unbounded sprintf writes to a logging message buffer of 80 bytes containing the device name: https://github.com/NeuronRobotics/nrjavaserial/blob/master/src/main/c/src/SerialImp.c#L669 and following.

david-pace commented 1 year ago

This could be the issue indeed. While looking through the C code I found the following issues:

Effective error messages that cause buffer overflows when using my device name as example:

open: locking has failed for /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n (95 characters)
open: locking worked for /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n (91 characters)
open: exclusive access denied for /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n (100 characters)
testRead: /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0 is good!\n (85 characters)
RXTX fhs_lock() Error: creating lock file for: /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0: %s\n (117+ characters)
uucp_lock: device was /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n (88 characters)

I could try to provide a pull request, but I have to declare that while I'm an experienced Java programmer, I don't know much about memory management in C. So I don't know whether just raising the buffer sizes would solve the problem, plus I don't know how to test this properly. Would anyone else who is more familiar with the code and knows how this can be tested be willing to fix this?

fwolter commented 1 year ago

Thanks for the research. Most points should be very easy to fix by replacing sprintf by snprintf and adding sizeof(message) as the additional second argument. The risk of breaking anything by these changes should be low.

Printing an uninitialized buffer should simply be removed I guess.

david-pace commented 1 year ago

Created PR #230, it would be great if you could review and test it as I don't have the required toolchains installed.

david-pace commented 1 year ago

Nice, thanks for merging this :+1:

@MrDOS do you know whether a nrjavaserial release is planned eventually?

@fwolter do you know the steps required to get a new nrjavaserial version into openHAB and can I assist in any way?

wborn commented 1 year ago

Thanks for fixing these bugs @david-pace!

I'd be happy to integrate a new nrjavaserial version in openHAB when it's released. :slightly_smiling_face:

It would also allow us to use an official release again, since we've been using our own "5.2.1.OH1" build to be able to use the fix for the annoying #111 bug :bug: :hammer:

So all changes up to 7aa21d1dc8cecdc8daad3ebc40273cfb8179e9d2 are well tested since they are part of OH 3.3.0 which was released ~6 months ago.

david-pace commented 1 year ago

Would it possible to create a new stable release of nrjavaserial @madhephaestus @crea-doo @kaikreuzer @MrDOS, and can we do anything to help?