InfiniTimeOrg / InfiniTime

Firmware for Pinetime smartwatch written in C++ and based on FreeRTOS
GNU General Public License v3.0
2.71k stars 926 forks source link

Hang in TwiMaster::Read #79

Closed Avamander closed 3 years ago

Avamander commented 3 years ago

I just encountered a situation where the PT froze on line 91 of void TwiMaster::Read(uint8_t deviceAddress, uint8_t *buffer, size_t size, bool stop).

Line in question:

while(!twiBaseAddress->EVENTS_LASTRX && !twiBaseAddress->EVENTS_ERROR);

It probably should have a timeout counter of some sort because such a freeze is not good when someone has a sealed PineTime.

Avamander commented 3 years ago

tt_392 on Matrix said:

I ended up just waiting for the thing to lock up, then turn off twim, force the pins back in the right position, and turn twim on again

JF002 commented 3 years ago

From our talk in the chat room, it looks like the TWI device is buggy and locks itself from time to time. Adding a timeout in the driver and completely reset the TWI device when we detect a hard lock seems the only solution.

TT-392 commented 3 years ago

Here is how I ended up fixing it btw https://github.com/TT-392/TT-time/blob/master/src/external/infinitime/i2c_pine.c

As far as I understood from the nordic forums (allthough I can't seem to find the specific thread anymore). It seems that you specifically have to forcefully reset your SCL and SDA pins.

JF002 commented 3 years ago

@TT-392 Thanks for the fix! It looks like the TWI drivers hangs when SPI and TWI are heavily used at the same time... I guess you couldn't find another solution that would prevent the device from locking?

rafacouto commented 3 years ago

Investigating this issue, I think it could be related to Nordic's hardware anomaly 109. Moreover, it could be also the same problem described on issue lupyuen/pinetime-rust-mynewt#24 affecting SPIM.

There are some workarounds in the mentioned document. Handling peripheral IRQ to wake up seems the best one in this case. See example code in 3.8.3 TWIM workaround subsection.

JF002 commented 3 years ago

I've just had a look at the anomaly referenced by @rafacouto. This is pretty bad : a race condition inside the CPU. And the workaround are quite complex to put in place.

However, the anomaly specifies that SPIM TX and TWIM TX are affected by the anomaly. TWIM RX is not referenced. But RX could be locked because TX caused the anomaly just before?

Anyway, I'm trying to reproduce it so that I can analyze that issue further... with no result for now I've tried using InfiniPaint (paint app) from develop, and the pong game developed by @ColdBrewCaffine (https://github.com/ColdBrewCaffine/Pinetime/tree/Run-dev-2).

Soooo... can you help me find a way to reproduce this issue more or less reliably?

JF002 commented 3 years ago

Ok, I can reproduce it! I just have to wait longer. The fix from @TT-392 seems to work, even though I would have preferred to be able to actually fix the issue instead of using this kind of workaround.

For info, the 5000 loop iterations correspond to ~160550 CPU cycles, which corresponds to ~2.5ms @64Mhz.

Now, I need to check that the code behaves correctly when the twi transaction is aborted.

JF002 commented 3 years ago

I've implemented the fix recommended by @TT-392, it's running fine for more than 4 hours with the pong game developed by Electrolyte. See this branch : https://github.com/JF002/Pinetime/tree/fix-twi-hang and this PR : https://github.com/JF002/Pinetime/pull/111

JF002 commented 3 years ago

Fixed in https://github.com/JF002/Pinetime/pull/111