Pi4J / pi4j-v1

DEPRECATED Java I/O library for Raspberry Pi (GPIO, I2C, SPI, UART)
http://www.pi4j.com
Apache License 2.0
1.31k stars 447 forks source link

GPIO Interrupts sometimes miss an interrupt #502

Closed eitch closed 3 years ago

eitch commented 4 years ago

I've got I2C boards with interrupts attached to BCM pins 9, 10, 22, 23 and 24. Basically the interrupts work as expected, but in some cases there are interrupts in a very short interval and i think that the underlying native C code misses this state change.

Is anyone out there who could look at the interrupt handlers and see if there is something wrong?

eitch commented 4 years ago

The native thread for the interrupt handling starts here:

https://github.com/Pi4J/pi4j/blob/master/pi4j-native/src/main/native/com_pi4j_wiringpi_GpioInterrupt.c#L81

eitch commented 4 years ago

I should probably also mention this is on a Java 11 64bit Raspi 4.

Further it used to work well, and then once after a update, where changes from pi4j came in, it suddenly started having issues.

eitch commented 4 years ago

After debugging this for the last two days i can say that there is something wrong with the native C code that does the polling. I have created a simple application with 2 IO expanders over I2C. 3 outputs of the one board are attached to 3 inputs on the other side and the input boards interrupt is attached to GPIO 24.

My code on start sets all outputs to 1, then waits till it reads all three input as 1, then after a second sets the 3 outputs back to 0 and waits again for all inputs to be 0. This in an endless loop. After a few minutes this loop hangs and i can see that the interrupt of the board is not handled, as the interrupt LED never goes out.

What i have seen in the native code is that the for every notification into Java, AttachCurrentThread is called, and then DetachCurrentThread is called afterwards. This seems to be something that could also be the cause for https://github.com/Pi4J/pi4j/issues/479

I have tried using ppoll(), but since i am not really good at C, i am not sure if it was done right, but it didn't help either. Log statements didn't really help, as the loop just stops after a while, for no apparent reason.

Sadly the sysfs GPIO is deprecated and shouldn't be used anymore anyhow, and any example on the internet simply uses poll as is done in this example.

The sysfs files aren't real files and can not be memory mapped to use a direct Java mapping approach and Java doesn't allow to watch the sysfs directory either, neither does it have support for listening for file changes. I would have to implement a file polling mechanism which would probably be bad for latency of these interrupts.

At the moment i don't see how to fix this issue.

eitch commented 4 years ago

Just tested on a Raspberry 3 with the same aarch64 build and there i get the same results.

eitch commented 4 years ago

An ugly workaround which works for me now is that i have added a poller to the interrupt pin which checks the pin state, and if an interrupt was handled in the last 1s and the state is expected to generate an interrupt, then i simply trigger the I2C read by the poller. This stops my application from hanging but does add quite a bit of latency in bad situations.

This allows me to see that after i perform an I2C read, the interrupt handler is broken for a bit, i.e. the next interrupt hangs as well, but then suddenly it is back to normal. My code fixed 7 interrupts in about as many minutes.

savageautomate commented 3 years ago

@eitch .. do you know if this issue persists or has been resolved with the changes you made in v1.4?

eitch commented 3 years ago

@savageautomate this is a difficult question to answer. I only ever had it occur on a customer production environment, where we have pneumatic pistons with a very short distance switching states from opened to closed so quickly, that interrupts went missing - and this only sporadically. At the moment the system is running and i can't justify updating it to test this.

From my tests i don't think it is possible to fix it, i think this is a bug in the interrupt notifications from the OS, or do you think differently?

I'll have to setup some kind of simulation here and let it run for a few hours, but that will take some time.

I guess we can close this for now.

savageautomate commented 3 years ago

@eitch

i think this is a bug in the interrupt notifications from the OS, or do you think differently?

I was thinking that was a strong possibility.
I had a project working with SPI where the Linux user mode API could not keep up with data rates and I had to create a custom kernel driver to work with it. The GPIO interrupt implementation based on FS polling is not exactly idea.

I agree we should close this for now and re-evaluate high speed GPIO interrupts in V2. Especially since WiringPi is now DEPRECATED and we may be basing GPIO events on a new library.

Thanks, Robert