golemparts / rppal

A Rust library that provides access to the Raspberry Pi's GPIO, I2C, PWM, SPI and UART peripherals.
MIT License
1.14k stars 90 forks source link

Many edge detection "interrupts" but level remains the same #149

Open andrewdavidmackenzie opened 2 weeks ago

andrewdavidmackenzie commented 2 weeks ago

I was debugging some weirdness in my app ("piggui") and added some logging (println!()) to when it detects edges on inputs.

input.set_async_interrupt(Trigger::Both, move |level| {
                        callback(bcm_pin_number, level == Level::High)
                    })

I'm using it with a pretty crappy button, that I imagine it has quite a bit of HW bounce in it. So I imagined a flurry of high/low/high/low level changes in short succession.

When testing however:

So, while I expected "bounce" I didn't expect the interrupt to report multiple level changes in short succession with the same level being reported twice in a row.

Thanks for clarifying what best to do.

andrewdavidmackenzie commented 2 weeks ago

When trying to cache the previous level detected in my own code, then only report changes up the chain, I have ran into difficulties due to the 'static bound on the FnMut that is passed to set_async_interrupt... I will keep trying though.

andrewdavidmackenzie commented 2 weeks ago

I see this code in interrupt.rs:

else if fd == interrupt.fd() {
                            let level = interrupt.event()?.level();
                            callback(level);
                        }

Since that is sitting in a thread in a loop, per pin, would you accept a PR to filter out interrupts when the level has not actually changed?

I would read the initial level before entering the loop, then compare always to previous and only call callback if level changed, and update previous.

golemparts commented 2 weeks ago

It has been several years since I last looked into this issue, so it's nice to have some fresh eyes on it.

While it's certainly possible RPPAL is handling something incorrectly, my conclusion at the time was that this is a limitation of the underlying linux drivers, with triggered events getting lost when too many occur within a short period of time, likely because the interrupt handler is still running when the interrupt gets triggered again. A switch getting pressed would definitely cause a large amount of events, and would require some debouncing.

Since that is sitting in a thread in a loop, per pin, would you accept a PR to filter out interrupts when the level has not actually changed?

I would read the initial level before entering the loop, then compare always to previous and only call callback if level changed, and update previous.

The problem with that approach is that we're hiding interrupt events. The level has changed. It's just that the Low between the two Highs didn't get reported, or vice versa. I think a better approach would be to assume an event was lost, and add one in between, but I'm not sure if I'm comfortable manufacturing events instead of just reporting the data we get from the drivers. Perhaps we just need better documentation to make it clear this can happen under certain circumstances, and let the end-user (in this case piggui) decide how best to handle it.

andrewdavidmackenzie commented 2 weeks ago

Understood. Do you think it also happens if you use only Rising, or only Falling edges?

Maybe there are bugs in the linux drivers, but that's getting a bit beyond my "expertise"... is there somewhere upstream for the drivers I could ask?

golemparts commented 2 weeks ago

It's a little beyond my expertise as well, as for anything other than direct GPIO register reads/writes I'm relying on linux drivers. Unfortunately I currently don't have enough time to invest in some dedicated testing to figure out under what circumstances interrupts are lost. If the cause is indeed too many interrupts within a short period of time, it would also happen for only rising or only falling edges.

My guess would be these aren't bugs, but limitations of handling interrupts in linux. I would check the official Raspberry Pi forums first, as any Python library that works with interrupts in a similar way as RPPAL does would run into the same issue.

andrewdavidmackenzie commented 2 weeks ago

I posted here on Raspberry Pi forum.

andrewdavidmackenzie commented 2 weeks ago

It would be good to get to the bottom of this, and if at all possible NOT loose edges...

But, meantime, FWIW I would have rrpal respect the "contract" with its users, to inform (only?) ok of level changes on an input....hence if the event from the Linux driver reads the same level as before I would not call the callback.

golemparts commented 1 week ago

But, meantime, FWIW I would have rrpal respect the "contract" with its users, to inform (only?) ok of level changes on an input....hence if the event from the Linux driver reads the same level as before I would not call the callback.

I appreciate the feedback, but like I mentioned before, barring any bugs in RPPAL or the drivers, a level change did happen. Not calling the callback means that we'd be hiding events, which I don't think is the correct behavior here. We're getting the same level multiple times in a row because one or multiple interrupt triggers were not reported by the underlying drivers. If the cause is a timing issue, we would be exacerbating the problem by skipping even more events.

There is an argument to be made for inserting a callback for the missing level change, but in the end RPPAL is just an interface library that makes it easier to access the various peripherals, and not a way to correct the data as reported by the system. I do need to add this to the documentation.

For piggui the situation is different, as it's an application for end-users to interact with and report the status of the GPIO pins. I would probably insert the missing interrupt trigger for the most accurate reporting, but it's of course up to you if you think filtering out the events is the better choice.

andrewdavidmackenzie commented 1 week ago

Fair enough.

I guess it comes down to the definition of "event" for the user of rrpal.

For me a level change event, should show a change in level.

Two level changes that in fact both report the same level, is kind of a non-event in my book. :-)

robsdedude commented 1 week ago

I came across the same issue yesterday. Getting many events in short succession (even with the same state) wasn't a big deal as debouncing it is common practice. The bigger issue though was that the last event of the sequence was not always the expected level. E.g., when pulling the pin from high to low, after some bounces, the last event was sometimes reporting high.

Imo, this makes this API somewhat pointless. The only thing the interrupt could be used for then is to be notified that something change so that you can go and actively read the pin after some silence period but the level passed to the interrupt is not usable at all.

golemparts commented 1 week ago

Imo, this makes this API somewhat pointless. The only thing the interrupt could be used for then is to be notified that something change so that you can go and actively read the pin after some silence period but the level passed to the interrupt is not usable at all.

I haven't noticed issues when detecting changes for both rising and falling edges on the same GPIO pin at reasonable speeds. It seems to skip events when the interrupt is triggered in rapid succession, like when pressing a switch. The documentation needs to make this clear. The way I would interpret the event is "a rising or falling edge was detected, this is the current pin level", with the logical conclusion that the level had an opposite value before that.

If you want better performance, you would have to bypass the linux drivers or just go bare metal entirely. This isn't a limitation of RPPAL but of the underlying drivers, and while I could insert or hide events to make the output make more sense, I think it's more appropriate for the end-user to decide what to do in that situation.

andrewdavidmackenzie commented 1 week ago

Piggy-backing on this issue thread, to mention that we've done a new release of the "piggui" GUI for controling and viewing GPIO hardware on a Raspberry Pi, using rppal.

You can find the release notes with videos and more gifs here

As a teaser, here is a YouTube video showing the LED and waveform display of an input, connected to a real hardware button.

If you have ideas for future functionality, please contribute issues or start a discussion

golemparts commented 1 week ago

As a side note, I started working on transitioning to the GPIO v2 API (here) back when it wasn't certain the Pi 5 would still give us direct GPIO register access, but never completed the transition when it was clear we could use /dev/gpiomem0 instead. The GPIO API, currently v1, is used for interrupt detection. The v2 documentation mentions this regarding missing events:

The size of the kernel event buffer is fixed at the time of line request creation, and can be influenced by the request.event_buffer_size. The default size is 16 times the number of lines requested.

The buffer may overflow if bursts of events occur quicker than they are read by userspace. If an overflow occurs then the oldest buffered event is discarded. Overflow can be detected from userspace by monitoring the event sequence numbers.

So while this doesn't solve the issue, once I get around to moving the interrupt handling to the v2 API, which adds those event sequence numbers, at least I can detect when an event has been skipped, and possibly add that info to the callback.

EDIT: Theoretically I could make the buffer as large as possible to limit how often it overflows if that's what's causing the missing events. It wouldn't solve the issue, just delay it, but it's something.

andrewdavidmackenzie commented 1 week ago

Thanks for pursuing this.

When I observed this (using println in my code), I saw repeated levels in events, very quickly.... i.e. it was only one or two events (the initial "down" on a button and maybe a bounce) before I observed a stream of 23 events, all with the same level.... so it didn't appear at first blushes like a buffer overrun of events (unless a buffer size of 1 or 2). FYI.

golemparts commented 1 week ago

I'm not sure what the buffer size is for the v1 API, or if it even buffers events at all, as the request.event_buffer_size value is only part of the v2 API, so a buffer size of 1 or 2 could be possible. At least with v2 we should have a clearer picture of what's happening, even if there is no real solution.

The only mention I found of a bug was regarding a race condition with interrupts on multiple GPIO pins, but I assume you weren't testing this on multiple pins at the same time.

golemparts commented 1 week ago

Since I'm adding the event sequence number to the callback, I might as well add the timestamp at the same time (best estimate of time of event occurrence, in nanoseconds, provided by the GPIO v2 API), which can be helpful to detect how much time has passed between two trigger events.

warthog618 commented 1 week ago

I'm not sure what the buffer size is for the v1 API, or if it even buffers events at all, as the request.event_buffer_size value is only part of the v2 API, so a buffer size of 1 or 2 could be possible. At least with v2 we should have a clearer picture of what's happening, even if there is no real solution.

v1 documentation. The buffer is fixed at 16. And if the buffer overflows the most recent event is discarded, which is really unfortunate for userspace debounce algorithms or anything interested in the final state. So for v1 you really don't want that buffer to overflow.

v2 has debounce support, and that is applied before the event buffer. That is probably what you want if you are dealing with noisy hardware.

Even v2 doesn't help with interrupts occurring faster then the kernel can handle them though. Those get lost with no indication, as they are too fast for the kernel to see.

Can't say I'm a fan of mixing direct register access with using a Linux driver. You have no protection against contention so you might get some entertaining interactions and those would be a serious pain to debug.

andrewdavidmackenzie commented 1 week ago

" I assume you weren't testing this on multiple pins at the same time."

No, just one input connected to a button. (you can see me using it in this video https://youtu.be/YZMhHqfmcx4 - but it doesn't show the trace of events....I could do that for you if needed)

" I might as well add the timestamp at the same time (best estimate of time of event occurrence, in nanoseconds, provided by the GPIO v2 API), which can be helpful to detect how much time has passed between two trigger events."

Yes, I add one later as soon as I receive the event, so one closer to the real time would be appreciated. I end up converting to a DateTime in chrono for displaying them on the moving waveform shart.

"And if the buffer overflows the most recent event is discarded"

That is a strange choice....getting the last event and the last known level would seem a better choice to me.

"v2 has debounce support, and that is applied before the event buffer. That is probably what you want if you are dealing with noisy hardware."

That sounbds positive.

"Even v2 doesn't help with interrupts occurring faster then the kernel can handle them though. Those get lost with no indication, as they are too fast for the kernel to see."

Off course, there is a limit to what the kernel can handle, always possible to be too quick for it.

warthog618 commented 1 week ago

"And if the buffer overflows the most recent event is discarded"

That is a strange choice....getting the last event and the last known level would seem a better choice to me.

Which is exactly why it was changed in v2. Not sure why v1 was that way - it was probably just easier and simpler and no one thought too hard about the implications. Discarding the oldest is more work given the way kernel FIFOs operate, but it is well worth the effort.

andrewdavidmackenzie commented 1 week ago

A contributor passed this to me:

libgpiod is the standard library for driving GPIOs via the GPIO character device, /dev/gpiochipN, which is the standard GPIO userspace interface for Linux. That interface is the only one that works on all Pi variants. Other libraries that use the obsolete sysfs interface or the /dev/gpiomem interface will not work on the Pi5.

Have you tried rppal on a Pi5?

golemparts commented 1 week ago

v1 documentation. The buffer is fixed at 16. And if the buffer overflows the most recent event is discarded, which is really unfortunate for userspace debounce algorithms or anything interested in the final state. So for v1 you really don't want that buffer to overflow.

Thanks for the insight. That explains a lot, and adds even more incentive to upgrade to v2.

Can't say I'm a fan of mixing direct register access with using a Linux driver. You have no protection against contention so you might get some entertaining interactions and those would be a serious pain to debug.

I would love to use the GPIO API for everything, but it has some limitations (in both v1 and v2) related to alternate function modes that makes it incompatible with part of RPPAL's feature set. It's something I'll be re-evaluating after the transition to v2, and limit direct register access only to functionality the API doesn't provide.

golemparts commented 1 week ago

A contributor passed this to me:

libgpiod is the standard library for driving GPIOs via the GPIO character device, /dev/gpiochipN, which is the standard GPIO userspace interface for Linux. That interface is the only one that works on all Pi variants. Other libraries that use the obsolete sysfs interface or the /dev/gpiomem interface will not work on the Pi5.

Have you tried rppal on a Pi5?

RPPAL is compatible with the Pi 5. We moved away from using sysfs a long time ago, and the Pi 5 provides direct GPIO register access on the RP1 through /dev/gpiomem0. See #127 for development details of the initial Pi 5 support.

warthog618 commented 1 week ago

I would love to use the GPIO API for everything, but it has some limitations (in both v1 and v2) related to alternate function modes that makes it incompatible with part of RPPAL's feature set. It's something I'll be re-evaluating after the transition to v2, and limit direct register access only to functionality the API doesn't provide.

Yeah, the GPIO character device is strictly a GPIO API. Call that a limitation if you like. There are other Linux subsystems and APIs for other pin functions. Switching pins between modes is very platform dependent and typically done with device-tree or some other API.