Open omeh-a opened 3 months ago
Duplicate of https://github.com/au-ts/sddf/issues/138.
Although not sure when we'll get to this one so probably better to have it as a separate issue.
@Ivan-Velickovic Have we got a preferred way of handling real errors? As of right now, the only mechanism we have for raising the alarm to the virtualiser is to notify and enqueue a request. We ought to have a discussion about how we want to deal with issues arising in terms of handling.
As a general case though, I think it might be smartest to have a retry system which will drop the request once exhausted. E.g. retry 5 times, drop request and return to virt with error if no successful ops between those retries.
I'm not sure what you mean, an error from which side? The client or the driver?
For real timeout errors, which I think is what you're referring to, we should just forward the error to the client and leave it up to them to decide what kind of retry policy they want. In reality they'll be some default policy in the client-level library.
We should avoid any policy in the driver whenever possible.
I meant on the driver side, but yes I think it's reasonable to just try and leave it in a state where the client can manage the rest.
In other news, upon closer inspection this issue is actually much worse than I anticipated. Using our PN532 example shows that the IRQs fire continuously whenever the driver is not operating, giving us >1000 timeouts between every PN532 read attempt. I know this is a PN532 specific thing, but pragmatically there really is nothing we can do for it other than neglect the timeouts because they are all simply artefacts of the NACKS from issue #235. If we do anything like discarding the request upon a timeout, the PN532 simply cannot operate since all the frames transmitted are too small for a single R/W buffer on the I2C hardware.
By constrast, there are never timeouts for the DS3231 example. Bearing this in mind, would it be inappropriate to consider adding a no-timeout
flag to requests? This way we can handle timeouts for all other devices that aren't borked and don't need to reschedule work on the PN532 immediately.
Perhaps this is the best way to handle this for now, because having a timeout handler is needed for any kind of prospective complex application but any timeout handler of any type would de-facto break the PN532 which is needed immediately.
Alternatively we can just neglect this until #235 is resolved. What do you think? Pragmatically the only actual problem here is the PN532, writing the timeout is trivial otherwise unless there is some other problem we don't know about. @wom-bat @Ivan-Velickovic
You mentioned that the PN532 is using I2C incorrectly? Can you describe why? I find it surprising since it seems like a fairly popular device.
You mentioned that the PN532 is using I2C incorrectly? Can you describe why? I find it surprising since it seems like a fairly popular device.
See https://github.com/au-ts/sddf/issues/235 - it creates a constant stream of NACK conditions that seem to be related to the I2C part of the PN532's microcontroller not properly halting the bus when it's busy doing NFC operations. It happens both with our driver and off-the-shelf ones I have tested with Arduino
Currently, the timeout handler in the Meson I2C driver is unable to function properly. It was firing constantly when operating demo code and has been disabled as a result. This is a serious problem however - in the event that a device ACKs a transaction and stops responding, the entire I2C driver will stall forever. This compromises the trustworthiness of the entire driver.
This issue proposes fixing this handler such that: a. It is able to function properly for all our demos - inc. PN532 b. It doesn't spuriously fire.
This issue might be related to the current hardcoded clock dividers used for I2C. If this is the case, this issue actually requires a clock driver to be implemented in order for correct clock setting.