InfiniTimeOrg / InfiniTime

Firmware for Pinetime smartwatch written in C++ and based on FreeRTOS
GNU General Public License v3.0
2.76k stars 944 forks source link

Touch controller error warning on PineTime #763

Closed Elara6331 closed 3 years ago

Elara6331 commented 3 years ago

What happened?

Touch controller error warning appears on PineTime on boot

What should happen instead?

Touch controller error warning should not appear on PineTime

Reproduction steps

More details?

This did not happen before installing the artifact from the mentioned PR. After tapping Proceed, touch works as expected.

Version

add-motion-service

Companion app

No response

Riksu9000 commented 3 years ago

If you have an issue with a PR, it should be mentioned in the PR instead, but this may not be related to that PR specifically as it doesn't change the code relevant for detecting the error.

I couldn't reproduce the issue. Did you validate the add-motion-service firmware after you flashed it? If not, what was the version that it rolled back to? Did you reset by pressing reset in the validation dialog or by long pressing the button? What is the value of Touch. in the about screen? The error is shown if it is something other than b4.0.1.

125638089-814c513c-b03b-48cf-9a4c-130f425a2f28

Elara6331 commented 3 years ago

If you have an issue with a PR, it should be mentioned in the PR instead, but this may not be related to that PR specifically as it doesn't change the code relevant for detecting the error.

Yes, I realized that and believe it has to do with one of the changes for 1.7.0 since the PR was in that milestone. I mentioned the PR because I wanted to make sure there wasn't something wrong there specifically, though that is unlikely.

I couldn't reproduce the issue. Did you validate the add-motion-service firmware after you flashed it? If not, what was the version that it rolled back to? Did you reset by pressing reset in the validation dialog or by long pressing the button? What is the value of Touch. in the about screen? The error is shown if it is something other than b4.0.1.

Yes, I validated it. All I see in the touch field is 0.0.0.

Elara6331 commented 3 years ago

Interesting. It now says 0.1.1

Riksu9000 commented 3 years ago

I suspect that your watch may have unfortunately ended up with a different touch sensor. When did you get your PineTime? @JF002 has mentioned receiving PineTimes with wrong sensors before and that Pine64 has had to refuse a batch due to a wrong sensor https://github.com/InfiniTimeOrg/InfiniTime/pull/489#issuecomment-934676608.

If your watch works fine, then you can just ignore the warning. If this issue is widespread, then the error detection might need to be removed, as it would show too many false positives.

If we were to release 1.7.0 with this error detection, we should definitely warn users that it may trigger false positives and that they should report whatever the Touch. value is here. @JF002

Elara6331 commented 3 years ago

I got it a couple weeks ago. The touch does work fine (other than the random touch inputs which others also seem to experience).

geekbozu commented 3 years ago

I'm also getting this on my development kit Riksu, Which is running current dev+ some patches. It also does not happen everytime. I will look into it more soon :)

JF002 commented 3 years ago

This feature was added in https://github.com/InfiniTimeOrg/InfiniTime/pull/489, and it looks quite safe as all PineTime should be equipped with the same CST816S touch controller. I knew that one batch of PineTime was returned to factory because they mounted the wrong controller, but as far as I know, those units were never shipped to end users.

So, if we saw this error message on actual PineTimes, we might have an issue in the code that detects the controller or we are facing some unknown variation of the hardware..

@geekbozu Could you investigate this on your device? Does the touch controller returns another device ID, or is there any error in the code? Thanks!

kieranc commented 3 years ago

I also see this error intermittently on my watch which was in the mid July 2021 batch I just checked about when it happened and it was showing 'touch. b4.0.ff'

Elara6331 commented 3 years ago

Something very strange seems to be going on. I checked the about screen again and it changed again to b4.1.1. Then, I left the screen and scrolled a bit. InfiniTime crashed (possibly unrelated), and then when it rebooted, there was no warning and the value was b4.0.1.

JF002 commented 3 years ago

Here's a picture on a devkit equipped with the wrong touch controller (this is a proto version and it was never releases (and never will) to the public. IIRC, this is a CST716 instead of the CST816. It doesn't work properly on InfiniTime as it does not support all gestures we need.

IMG_20211023_094459

As you can see, the version detected is 20.80.2 (chip.vendor.version). I would have expected the vendor ID to be the same on both device, but that's not the case. This makes me think that those ID might not be as reliable as we would like... or maybe there are multiple "vendors" for the same chip? or clones?

Also, I couldn't reproduce the behavior detailed by @kieranc and @Arsen6331 (the ID that are different from time to time), even after reseting my units multiple times...

Riksu9000 commented 3 years ago

The version ID on that prototype is still 2, which as far as I know 1 is CST816 and 2 is CST716, so if there are PineTimes with chips from a different vendor or clones in the wild, using the version ID might still have some hope. https://github.com/InfiniTimeOrg/InfiniTime/pull/492#issuecomment-917434246

@kieranc getting 0xFF means that the first two values were read successfully, but twiMaster.Read() returned an error on the third read for some reason. The addresses are back to back, so they could all be read at once if that is more stable. https://github.com/InfiniTimeOrg/InfiniTime/blob/9538eb97166410ebc362addfb76ec54895491cd7/src/drivers/Cst816s.cpp#L62

I'm also not getting any variation on those values. It's always b4.0.1

Also is the random touches issue separate from raise wrist being too sensitive? Could it be that the same misbehaving chips that read random inputs also sometimes return wrong ID values?

geekbozu commented 3 years ago

TBH I wonder if we even need the TWI bus at 390Khz, 100kHz or 200 might be sufficient for the amount of info we poll from it. And that will make it much more stable. Specially since its only sensors on it. The accelerometer would be the only real "bottleneck" IMO...and that will be fixed/shennagined in time.

Avamander commented 3 years ago

It would probably be fine, but might as well just make it an option for unstable devices. I wouldn't degrade performance ~2-4x if there's no good reason on that specific hardware.

geekbozu commented 3 years ago

So that is the thing, I'm not convinced it would be a "degradation" as only the HRM, Accelerometer, and touch sensor are on the TWI bus. There should be a minimal if any performance impact for slowing it down, save when we are heavily polling the Accelerometer which currently we do not do. I'll do some performance testing with that and look into this soon. I'm wondering if there is some cross talk or something hurting the TWI bus at these speeds honestly...

Riksu9000 commented 3 years ago

I just got this error and the value is b4.1.1, so the "vendor id" changed.

JF002 commented 3 years ago

I've just done some tests : I check the ids (chip, vendor firmware) each time the function Cst816S::GetTouchInfo() is called. Here are my findings:

So... should we just assume the I²C communication is not reliable? That doesn't look right to me... I know that this is a serial bus and that interference may occur, but in this case, it doesn't really look like random interference (it happens way too often).

Basically, I think the check we implemented in the touch driver to check that a compatible chip is installed is good, but that the communication with the chip is not as reliable as we would like...

Avamander commented 3 years ago

Is the hardware configured properly, all the pullups where they should be?

geekbozu commented 3 years ago

Device has hardware pull ups on it. 2.7K IIRC which should be more then strong enough for the 400Khz we are operating the bus at.

JF002 commented 3 years ago

I noticed something interesting : I patched SystemTask to read the registers of the touch controller more often (every 100ms)

IIRC, when I first implemented the touch driver, the controller would trig a single IRQ when the user touched the display. In this IRQ, I would read the register to figure the gesture (tap, swipe,...) out and fetch X/Y values.

Now (I think since the touch handler was revisited and improved by @Riksu9000), a lot of IRQ are triggered when the user touches the display, meaning we read the content of the register while the user touch/release/move their finger.

Is there anything that would prevent the touch controller from communicating correctly while it's touched by the user?

kieranc commented 3 years ago

Can you test it with the watch attached to your wrist or otherwise grounded somehow and see if that impacts it? I'm pretty sure this issue is the one I was experiencing with my dev watch and it seemed like the fact it wasn't on a wrist exacerbated the problem.

kieranc commented 3 years ago

Is the hardware configured properly, all the pullups where they should be?

I can't see discrete pullups on the schematic, are they internal to the nrf? Activated?

Edit. Never mind, 2k2 on the accelerometer

geekbozu commented 3 years ago

So to add some to this, I noticed while reviewing code on #691 that the indicator would jump to MAX when taping the screen....I need to verify but I think the touch screen is also clobbering reads to the accelerometer temporarily. The more i think about it i wonder if this is some of the cause of the weird HR sensor behavior as well.

JF002 commented 3 years ago

I did an additional test : built a test program that only polls the touch sensor every 50ms. No OS, not IRQ, no mutex, no other devices : image

Even with this simple code, I still get bad readings on the device IDs when I keep the finger pressed on the display, or when I tap it.

Does this mean that we have to accept that some readings will be erroneous? We can still sort of workaround this, but we will never be able to achieve 100% reliability:

JF002 commented 3 years ago

I implemented those checks in https://github.com/InfiniTimeOrg/InfiniTime/pull/811.

I'm not sure if it's related, but I noticed I could easily crash (watchdog reset) InfiniTime when playing Twos. The execution was stuck in TwiMaster::Read() or TwiMaster::Write(), waiting for the STOP event that never came.

Note that the callstack showed that these blocking call where always coming from SystemTask::UpdateMoin(). Removing this call allowed me to play the game for several minutes without a crash. This is not new (https://github.com/InfiniTimeOrg/InfiniTime/issues/79), but we have never found a solution that was completely reliable...

Riksu9000 commented 3 years ago

Check the validity of the touch info (gesture, x/y values) before using them in the software

Did you encounter bad touchinfo values?

If the IDs are wrong when touching, couldn't we check that the touching bit is zero?

JF002 commented 3 years ago

I added a log to show when bad values were detected but didn't log the values so I cannot tell how corrupted they were, unfortunately. That would be an interesting test! Also, this is very confusing, but I noticed that I would experience far more issues on 2 devkits than on my sealed unit. For example, the devkits crashes after a few moves in Twos (and I could see a lot of touch/I²C errors in the log), while I could play several minutes on my sealed pinetime without issues (but no log on SWD, as it's sealed).

kieranc commented 3 years ago

Also, this is very confusing, but I noticed that I would experience far more issues on 2 devkits than on my sealed unit. For example, the devkits crashes after a few moves in Twos (and I could see a lot of touch/I²C errors in the log), while I could play several minutes on my sealed pinetime without issues (but no log on SWD, as it's sealed).

In this context, is the sealed device on your wrist and the devkits on the desk/otherwise mounted?

edit: do the devkits have backs/vibration motors attached?

StarGate01 commented 2 years ago

I noticed that the CST816S in my watch (sometimes) ignores the reset signal. Instead I had to defer the register initialization to the first invocation of the touch event interrupt callback, because at this point the I2C communication window is open.

JF002 commented 2 years ago

@StarGate01 Do you mean that the reset signal sent on the reset pin (https://github.com/InfiniTimeOrg/InfiniTime/blob/develop/src/drivers/Cst816s.cpp#L23) does not actually resets the chip?

StarGate01 commented 2 years ago

@JF002 Yes, that is exactly what I suspect. I noticed this behavior because on my pinetime dev kit, reading the chip ID on firmware boot failed >90% of the time. It only worked reliably when I furiously tapped the screen during startup. The datasheet states that the Cst816S has an activity window (default 2 seconds) after a touch event, and I2C is only active during that window. The datasheet recommends performing the exact reset procedure as was implemented already to ensure the window is active, however this did not work reliably for some reason on my device. It might be something about a potentially cloned touch chip, some weird electrostatic effects, capacitance on the signal lines or straight up a hardware defect on my device. However, some users reported the touch driver failing to initialize or their chip Id being zero very sporadically in the past as well.

I notices that upon a failed reset signal not only the chip ID could not be read, but also the interrupt configuration register failes to be written to. This resulted in an incorrect event reporting behavior on my device. My theory of the reset failing is further supported by the fact that the modified configuration state of the touch chip survives firmware reboots, but resets to its default value after a power loss.

Interestingly, the old reset procedure works great on my P8b watch, which uses a Cst716 touch chip. However, that chip does not feature auto-sleep and auto-wake, but instead has an always-on I2C bus (until you manually put it to sleep). Why the actual chip on the PCB is marked Cst816 is another mystery, it reports and behaves exactly like a Cst716.

JF002 commented 2 years ago

@StarGate01 Thank you very for those information! You're right, some users reported issues with the touch panel that was not correctly detected during boot, and that's why we had to remove the warning that was displayed at startup : perfectly functioning touch panels were incorrectly detected as unsupported.

So your workaround is to try to initialize the IC again in the event handler of the IRQ pin? Does it work more reliably with those IC that ignore the reset signal?

StarGate01 commented 2 years ago

@JF002 Yes, that is my workaround, see https://github.com/StarGate01/InfiniTime/commit/402ebaebedfee0e5846842fc8743959eae855a31#diff-9d9d5389cf4cc0c3064d0778c604a6c4f06329d64e4c18a54464a7cbba9682d2= .

I suspect there could be some residual charge pulling the reset pin high-ish, as some users reported the touch panel working again after they grounded the aluminium case by touching it.

The workaround works for my device every single time.