gary-rowe / hid4java

A cross-platform Java Native Access (JNA) wrapper for the libusb/hidapi library. Works out of the box on Windows/Mac/Linux.
MIT License
229 stars 71 forks source link

Threading (?) causing glibc/low level crashes #155

Open fricpa opened 1 month ago

fricpa commented 1 month ago

Platform

Knowing the platform greatly narrows down the potential causes of the problem.

To Reproduce

Steps to reproduce the behavior:

Write a trivial program

HidServices hidServices =
            HidManager.getHidServices(new HidServicesSpecification());
while (true) hidServices.getAttachedHidDevices();

let it run for a while on the specified platforms.

Expected behavior

Runs without issues forever.

Screenshots and logs

I observed three crash modes so far (note I have a littlescript running the app and logging some stuff, but the basic program is as above):

all of them often appear within a few minutes of running that loop, however, sometimes they don't appear for a long time or only after I plugged in some devices and read/wrote some data to them...

1.

2024-07-18T10:53:34,225 INFO  [org.example.Main.main()] org.example.Main - enumerate hid devices...
2024-07-18T10:53:34,226 INFO  [org.example.Main.main()] org.example.Main - =======================
2024-07-18T10:53:34,227 INFO  [org.example.Main.main()] org.example.Main - enumerate hid devices...
2024-07-18T10:53:34,228 INFO  [org.example.Main.main()] org.example.Main - =======================
2024-07-18T10:53:34,229 INFO  [org.example.Main.main()] org.example.Main - enumerate hid devices...
double free or corruption (!prev)
./run.sh: line 7:  1484 Aborted                 MAVEN_OPTS="-ea" mvn package exec:java "-Dexec.mainClass=org.example.Main"
FATAL ERROR EXIT CODE 134 AT ./run.sh:7

2.

2024-07-18T10:53:44,120 INFO  [org.example.Main.main()] org.example.Main - enumerate hid devices...
2024-07-18T10:53:44,120 INFO  [org.example.Main.main()] org.example.Main - =======================
2024-07-18T10:53:44,120 INFO  [org.example.Main.main()] org.example.Main - enumerate hid devices...
corrupted size vs. prev_size
./run.sh: line 7:  2845 Aborted                 MAVEN_OPTS="-ea" mvn package exec:java "-Dexec.mainClass=org.example.Main"
FATAL ERROR EXIT CODE 134 AT ./run.sh:7

3.


2024-07-18T10:53:44,120 INFO  [org.example.Main.main()] org.example.Main - enumerate hid devices...
2024-07-18T10:53:44,120 INFO  [org.example.Main.main()] org.example.Main - =======================
2024-07-18T10:53:44,120 INFO  [org.example.Main.main()] org.example.Main - enumerate hid devices...
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x0000007fa7bada9c, pid=2754, tid=2802
    #
    # JRE version: OpenJDK Runtime Environment (17.0.11+9) (build 17.0.11+9-Debian-1deb12u1)
    # Java VM: OpenJDK 64-Bit Server VM (17.0.11+9-Debian-1deb12u1, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
    # Problematic frame:
    # C  [libc.so.6+0x8da9c]
    [timeout occurred during error reporting in step "printing problematic frame"] after 30 s.
    # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /home/pi/hid4java-apd-test/hs_err_pid2754.log
    # [ timer expired, abort... ]
    ./run.sh: line 7:  2754 Aborted                 MAVEN_OPTS="-ea" mvn package exec:java "-Dexec.mainClass=org.example.Main"
    FATAL ERROR EXIT CODE 134 AT ./run.sh:7

or, on Ubuntu 24.04

2024-07-18T11:07:20,940 INFO  [org.example.Main.main()] org.example.Main - =======================
2024-07-18T11:07:20,940 INFO  [org.example.Main.main()] org.example.Main - enumerate hid devices...
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000074a8142ab7ec, pid=12909, tid=12965
#
# JRE version: OpenJDK Runtime Environment (11.0.23+9) (build 11.0.23+9-post-Ubuntu-1ubuntu1)
# Java VM: OpenJDK 64-Bit Server VM (11.0.23+9-post-Ubuntu-1ubuntu1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0xab7ec]
[timeout occurred during error reporting in step "printing problematic frame"] after 30 s.
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/ubuntu/hid4java-apd-test/core.12909)
#
# An error report file with more information is saved as:
# /home/ubuntu/hid4java-apd-test/hs_err_pid12909.log

Additional information I have not observed any of these failure modes on amd64 Windows 10, the loop seems to run forever there as it should.

However, on Linux it's definitely broken on every platform I tested.

It seems a lot of such issues can be caused by talking to native code from multiple java threads:

https://stackoverflow.com/questions/22491797/java-double-free-or-corruption https://stackoverflow.com/questions/49628615/understanding-corrupted-size-vs-prev-size-glibc-error

I don't quite understand why hid4java needs any threads in the first place

image

at least for my usecase, all I would need are synchronous enumeration, synchronous read & write (with timeout), all of which are synchronous calls in hidapi

fwiw I have attached the hs_err log files hs_err_pid2754.log hs_err_pid12909.log

fricpa commented 1 month ago

for now, I have created a private fork of this repo and removed all Thread based functionality (scan thread, reader thread); now the same infinite loop never crashes the program

fricpa commented 1 month ago

FWIW, with hid4java:0.8.0 I also did get a fatal error/crash on Windows 10 at least once now, hs err attached

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffd41751b40, pid=22844, tid=27032
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.23+9 (11.0.23+9) (build 11.0.23+9)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.23+9 (11.0.23+9, mixed mode, tiered, compressed oops, g1 gc, windows-amd64)
# Problematic frame:
# C  0x00007ffd41751b40
#
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

hs_err_pid22844.log

fricpa commented 1 month ago

with the threadless version I cannot reproduce this anymore; there must be some thread safety concerns that are violated by the current implementation

I would be fairly cautious interacting with libhidapi.so and even with JNA in anything but a single-threaded or at least serialized fashion...

fricpa commented 1 month ago

Here's a reference for hidapi not being thread-safe: https://github.com/libusb/hidapi/wiki

FAQ hidapi is not thread-safe in general. How to use hidapi in multithreaded application? https://github.com/libusb/hidapi/issues/45