commaai / openpilot

openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 275+ supported cars.
https://comma.ai/openpilot
MIT License
50.04k stars 9.14k forks source link

controlsd: lagged for 100ms #28899

Closed adeebshihadeh closed 6 months ago

adeebshihadeh commented 1 year ago

https://connect.comma.ai/9b25e8c1484a1b67/1682063616542/1682063673396 https://connect.comma.ai/297b4b460f361603/1688795349008/1688795363644

sshane commented 1 year ago

~c2585539cc5ebcb3|2023-07-29--07-37-37--101~ actually CAN dropped out

sshane commented 1 year ago

https://connect.comma.ai/e886087f430e7fe7/1691516170616/1691516194613

sshane commented 1 year ago

more cases looking at random 50k rlogs last 30 days on release, not exactly 100ms every time:

segment, ms, platform, devicetype
c2585539cc5ebcb3|2023-07-29--07-37-37--101 72.1630859375 TOYOTA COROLLA TSS2 2019 8
06362e6a7f0b400b|2023-08-06--21-08-29--32 65.93787384033203 TOYOTA CAMRY 2018 7
25593b1a5f760b07|2023-07-13--16-01-55--13 81.96173095703125 TOYOTA HIGHLANDER 2020 8
b576d2ff8a193b4a|2023-07-26--23-03-30--3 89.41193389892578 KIA NIRO EV 2020 7
20665f9a424ada41|2023-07-27--17-09-07--2 57.68474864959717 CHRYSLER PACIFICA 2020 7
devtekve commented 1 year ago

On Sunnypilot I have also experienced this and tried to debug it with no success. I saw it with carState, not sure if that helps. I believe it is related to cereal but I have a big knowledge gap still with OP.

devtekve commented 1 year ago

Oh, something else I also noticed is that using network via cellular produced more cumLags than using network on Wifi. Using no network connectivity at all yielded the lowest cumLags. I feel there might be a correlation somewhere.

devtekve commented 1 year ago

IMPORTANT UPDATE: This issue seems to be more consistent with HW failure, we asked the user to check the cables

View original (Likely irrelevant)

I have another user who also has been experiencing controls lagging, we asked him to install stock OP to rule it out and he still has a significantly high cumlags. He has a C3 device and a Kia EV6 (canfd) `a12a00d7b630927b|2023-10-25--00-05-38` lag reaches 450ms on this route ![image](https://github.com/commaai/openpilot/assets/7696966/47cfd9e9-4b9c-4a27-88fc-5218f25a89a4) `a12a00d7b630927b|2023-10-24--10-57-19` morning route about 260 ms ![image](https://github.com/commaai/openpilot/assets/7696966/9fe9f3bf-df01-43f8-9dce-6b23954a3543) `a12a00d7b630927b|2023-10-24--17-27-26` evening route about 860ms at the end ![image](https://github.com/commaai/openpilot/assets/7696966/f73bb2ce-c655-43b7-9def-761f8f3b899e)

sunnyhaibin commented 1 year ago

Another user also experienced this issue. This is stock openpilot + 2023 Hyundai Palisade HDA2 car port changes. Seems to be related to pings with Athena but I could be wrong. Any information we can provide for troubleshooting? image

https://github.com/commaai/openpilot/assets/47793918/749f9804-ed44-474d-969d-139d7dba869d

devtekve commented 1 year ago

The following routes are my own routes, I have a C3X

this is on a very old commit (after long enabled on ioniq phev) https://github.com/commaai/openpilot/commit/a552fafd8864bfb31a6e0f042f2816f276f6e66e e1107f9d04dfb1e2|2023-10-17--18-41-47 image


One ride with stock OP (nlp-driving https://github.com/commaai/openpilot/tree/ace3a59dad2c5c366bccf09cfca9a7a1d75b3501) under this conditions: e1107f9d04dfb1e2|2023-10-25--10-01-54


ride with stock OP (nlp-driving https://github.com/commaai/openpilot/tree/ace3a59dad2c5c366bccf09cfca9a7a1d75b3501) under this conditions: e1107f9d04dfb1e2|2023-10-25--09-46-59--1


ride with stock OP (nlp-driving https://github.com/commaai/openpilot/tree/ace3a59dad2c5c366bccf09cfca9a7a1d75b3501) under this conditions: e1107f9d04dfb1e2|2023-10-25--09-39-32

image

devtekve commented 1 year ago

I am also seeing that spiChecksumErrorCount goes up at sort of the same rate than the cumlags

devtekve commented 1 year ago

I seem to have found a reliable way to increase cumlag at will!...

Executing /data/openpilot/panda/tests/health_test.py is able to reproduce the spike in spi checksum error count and the cumlag!!

image route: e1107f9d04dfb1e2|2023-10-27--12-36-59

that route is with sunnypilot, but I was completely still, with the car in park. The spikes on the cumlag and spichecksum are triggered by me intentionally starting the health_test.py , then stopping it for a while then starting it again.

for this test I had also disabled the ubloxd (i have a c3x).

sunnyhaibin commented 1 year ago

Another user also experienced this issue. This is stock openpilot + 2023 Hyundai Palisade HDA2 car port changes. ~Seems to be related to pings with Athena but I could be wrong.~ Any information we can provide for troubleshooting? image

Recording.2023-10-25.104624.mp4

Same route, spiChecksumErrorCount is spiking badly and seems to be causing everything else to go crazy too.

https://github.com/commaai/openpilot/assets/47793918/db2ac71f-e277-42cd-80f3-02fb52c431e2

devtekve commented 1 year ago

@adeebshihadeh @sshane I've ran out of ideas and things to check :( I am not too familiar with Panda code, seems the error is there. Can we please bump the priority on this? We've spent so many hours trying to tackle this already and it's been extremely challenging

this is some output with debug_console.py...

comma@tici:/data/openpilot/panda$ tests/debug_console.py

************************ MAIN START ************************
Config:
  Board type: Tres
detected car harness with orientation 01
**** INTERRUPTS ON ****
- incorrect header sync or checksum 11 11 11 11 11 11 11
Interrupt 0x00000036 fired too often (0x00000002/s)!
- incorrect data checksum 0000
5a 81 00 00 c0 07 b7

- incorrect header sync or checksum 11 11 11 11 11 11 11
- incorrect data checksum 000e
5a 03 0e 00 00 00 fc
13 00 1a 13 00 1a 13 00 1a 13 00 1a 13 00

- incorrect data checksum 0007
5a 00 07 00 40 00 b6
13 01 00 13 01 00 13

- incorrect data checksum 000e
5a 03 0e 00 00 00 fc
13 00 1a 13 00 1a 13 00 1a 13 00 1a 13 00

- incorrect data checksum 0007
5a 00 07 00 00 00 f6
13 00 00 13 00 00 13

- incorrect data checksum 0007
5a 00 07 00 40 00 b6
13 01 00 13 01 00 13

- incorrect data checksum 0000
5a 81 00 00 c0 07 b7

- incorrect data checksum 000e
5a 03 0e 00 00 00 fc
13 00 1a 13 00 1a 13 00 1a 13 00 1a 13 00

- incorrect header sync or checksum 11 11 11 11 11 11 11
adeebshihadeh commented 6 months ago

Tracking in #32286