Closed adeebshihadeh closed 6 months ago
~c2585539cc5ebcb3|2023-07-29--07-37-37--101~ actually CAN dropped out
more cases looking at random 50k rlogs last 30 days on release, not exactly 100ms every time:
segment, ms, platform, devicetype
c2585539cc5ebcb3|2023-07-29--07-37-37--101 72.1630859375 TOYOTA COROLLA TSS2 2019 8
06362e6a7f0b400b|2023-08-06--21-08-29--32 65.93787384033203 TOYOTA CAMRY 2018 7
25593b1a5f760b07|2023-07-13--16-01-55--13 81.96173095703125 TOYOTA HIGHLANDER 2020 8
b576d2ff8a193b4a|2023-07-26--23-03-30--3 89.41193389892578 KIA NIRO EV 2020 7
20665f9a424ada41|2023-07-27--17-09-07--2 57.68474864959717 CHRYSLER PACIFICA 2020 7
On Sunnypilot I have also experienced this and tried to debug it with no success. I saw it with carState
, not sure if that helps. I believe it is related to cereal but I have a big knowledge gap still with OP.
Oh, something else I also noticed is that using network via cellular produced more cumLags
than using network on Wifi. Using no network connectivity at all yielded the lowest cumLags
. I feel there might be a correlation somewhere.
IMPORTANT UPDATE: This issue seems to be more consistent with HW failure, we asked the user to check the cables
I have another user who also has been experiencing controls lagging, we asked him to install stock OP to rule it out and he still has a significantly high cumlags. He has a C3 device and a Kia EV6 (canfd) `a12a00d7b630927b|2023-10-25--00-05-38` lag reaches 450ms on this route ![image](https://github.com/commaai/openpilot/assets/7696966/47cfd9e9-4b9c-4a27-88fc-5218f25a89a4) `a12a00d7b630927b|2023-10-24--10-57-19` morning route about 260 ms ![image](https://github.com/commaai/openpilot/assets/7696966/9fe9f3bf-df01-43f8-9dce-6b23954a3543) `a12a00d7b630927b|2023-10-24--17-27-26` evening route about 860ms at the end ![image](https://github.com/commaai/openpilot/assets/7696966/f73bb2ce-c655-43b7-9def-761f8f3b899e)
696748e0ac8082fb|2023-10-24--10-45-04
Another user also experienced this issue. This is stock openpilot + 2023 Hyundai Palisade HDA2 car port changes. Seems to be related to pings with Athena but I could be wrong. Any information we can provide for troubleshooting?
https://github.com/commaai/openpilot/assets/47793918/749f9804-ed44-474d-969d-139d7dba869d
The following routes are my own routes, I have a C3X
this is on a very old commit (after long enabled on ioniq phev) https://github.com/commaai/openpilot/commit/a552fafd8864bfb31a6e0f042f2816f276f6e66e e1107f9d04dfb1e2|2023-10-17--18-41-47
One ride with stock OP (nlp-driving https://github.com/commaai/openpilot/tree/ace3a59dad2c5c366bccf09cfca9a7a1d75b3501) under this conditions:
e1107f9d04dfb1e2|2023-10-25--10-01-54
ride with stock OP (nlp-driving https://github.com/commaai/openpilot/tree/ace3a59dad2c5c366bccf09cfca9a7a1d75b3501) under this conditions:
e1107f9d04dfb1e2|2023-10-25--09-46-59--1
ride with stock OP (nlp-driving https://github.com/commaai/openpilot/tree/ace3a59dad2c5c366bccf09cfca9a7a1d75b3501) under this conditions:
e1107f9d04dfb1e2|2023-10-25--09-39-32
I am also seeing that spiChecksumErrorCount
goes up at sort of the same rate than the cumlags
I seem to have found a reliable way to increase cumlag at will!...
Executing /data/openpilot/panda/tests/health_test.py
is able to reproduce the spike in spi checksum error count and the cumlag!!
route: e1107f9d04dfb1e2|2023-10-27--12-36-59
that route is with sunnypilot, but I was completely still, with the car in park. The spikes on the cumlag and spichecksum are triggered by me intentionally starting the health_test.py
, then stopping it for a while then starting it again.
for this test I had also disabled the ubloxd
(i have a c3x).
- Branch source: HKG: Car Port for Hyundai Palisade and Kia Telluride 2023-24 (HDA2) #27392
- Device: C3X
- Route ID:
696748e0ac8082fb|2023-10-24--10-45-04
Another user also experienced this issue. This is stock openpilot + 2023 Hyundai Palisade HDA2 car port changes. ~Seems to be related to pings with Athena but I could be wrong.~ Any information we can provide for troubleshooting?
Recording.2023-10-25.104624.mp4
Same route, spiChecksumErrorCount
is spiking badly and seems to be causing everything else to go crazy too.
https://github.com/commaai/openpilot/assets/47793918/db2ac71f-e277-42cd-80f3-02fb52c431e2
@adeebshihadeh @sshane I've ran out of ideas and things to check :( I am not too familiar with Panda code, seems the error is there. Can we please bump the priority on this? We've spent so many hours trying to tackle this already and it's been extremely challenging
this is some output with debug_console.py
...
comma@tici:/data/openpilot/panda$ tests/debug_console.py
************************ MAIN START ************************
Config:
Board type: Tres
detected car harness with orientation 01
**** INTERRUPTS ON ****
- incorrect header sync or checksum 11 11 11 11 11 11 11
Interrupt 0x00000036 fired too often (0x00000002/s)!
- incorrect data checksum 0000
5a 81 00 00 c0 07 b7
- incorrect header sync or checksum 11 11 11 11 11 11 11
- incorrect data checksum 000e
5a 03 0e 00 00 00 fc
13 00 1a 13 00 1a 13 00 1a 13 00 1a 13 00
- incorrect data checksum 0007
5a 00 07 00 40 00 b6
13 01 00 13 01 00 13
- incorrect data checksum 000e
5a 03 0e 00 00 00 fc
13 00 1a 13 00 1a 13 00 1a 13 00 1a 13 00
- incorrect data checksum 0007
5a 00 07 00 00 00 f6
13 00 00 13 00 00 13
- incorrect data checksum 0007
5a 00 07 00 40 00 b6
13 01 00 13 01 00 13
- incorrect data checksum 0000
5a 81 00 00 c0 07 b7
- incorrect data checksum 000e
5a 03 0e 00 00 00 fc
13 00 1a 13 00 1a 13 00 1a 13 00 1a 13 00
- incorrect header sync or checksum 11 11 11 11 11 11 11
Tracking in #32286
https://connect.comma.ai/9b25e8c1484a1b67/1682063616542/1682063673396 https://connect.comma.ai/297b4b460f361603/1688795349008/1688795363644