commaai / panda

code powering the comma.ai panda
MIT License
1.53k stars 777 forks source link

H7: CAN Bus Disconnected during button press send to car #1452

Closed sunnyhaibin closed 1 year ago

sunnyhaibin commented 1 year ago

For some HKG CAN-FD, when the resume button press is sent to the car, it would cause "CAN Bus Disconnected" and requires the car to restart to reset the error.

We have attempted to revert this commit, and users are no longer getting this error. We are unsure of the relevance, but so far no user has experienced this issue in many Stop and Go traffic with openpilot sending the resume button to the car.

This issue is also seen on sunnypilot 0.9.2, which previous versions did not happen (although not certain when this started happening, it was not an issue back on December 16th, 2022).

Affected Platforms

Affected Routes

Affected Route on sunnypilot

adeebshihadeh commented 1 year ago

Bus 1 is going into the bus off state. @briskspirit can you look into this?

adeebshihadeh commented 1 year ago

@sunnyhaibin how did you come up with that commit? did you narrow it down to exactly that one or only tested before/after that one? The commit only changes behavior on init, and doesn't even run on the red panda. It would be helpful if you guys can git bisect to the exact commit since it's easy to repro.

sunnyhaibin commented 1 year ago

@sunnyhaibin how did you come up with that commit? did you narrow it down to exactly that one or only tested before/after that one? The commit only changes behavior on init, and doesn't even run on the red panda. It would be helpful if you guys can git bisect to the exact commit since it's easy to repro.

@adeebshihadeh This came up with git bisect. It's a bit confusing as the other ones that came up didn't seem to be related with Red Panda, this was the closest one.

adeebshihadeh commented 1 year ago

I couldn't reproduce this on our EV6 on openpilot master, even sending 50 button msgs every frame. @sunnyhaibin do you have a reliably way to reproduce this? ideally a clean branch based on master We've reproduced it; fix coming soon!

sunnyhaibin commented 1 year ago

We were attempting to enable openpilot longitudinal control on the Ioniq 6 2023 HDA2 by disabling the ADAS Driving ECU 0x730 via bus 1, the same method used to enable openpilot longitudinal for currently supported HDA2 cars. It seems that as soon as the ECU is disabled (confirmed in cabana and plotjuggler), bus 1 went into the bus off state and there is no traffic on bus 1. This did not affect bus 0 or 2, however.

briskspirit commented 1 year ago

Is it the only ECU on the bus?

sunnyhaibin commented 1 year ago

Is it the only ECU on the bus?

It is not, AFAIK. ECUs that broadcast MDPS, SCC, ESP, etc. are also on bus 1.

sunnyhaibin commented 1 year ago

We have another route that seems to have triggered this issue:

Hey @sunnyhaibin had a canbus error while driving today. I was kind of spamming the off and on instead of letting the car slam on breaks due to the long stock radar. Could have been my actions but wanted to share just in case. I rebooted in the middle of a drive and no issues. Got a few hours of time on most recent update and this is the first fault

briskspirit commented 1 year ago

@sunnyhaibin messaged you with the branch to try

sunnyhaibin commented 1 year ago

Sent @briskspirit in DM with the routes that have 100% success rates of button sends not putting bus 1 into bus off state, posting them here as well for visibility. https://github.com/commaai/panda/pull/1615 was implemented:

696748e0ac8082fb|2023-09-02--19-41-51

696748e0ac8082fb|2023-09-02--20-52-30

696748e0ac8082fb|2023-09-02--22-26-43

696748e0ac8082fb|2023-09-03--12-21-33

696748e0ac8082fb|2023-08-31--14-25-28

28aa956828c3407d|2023-09-03--17-16-47

caf6a54b6d467dbd|2023-09-02--11-59-54
sunnyhaibin commented 1 year ago

A route from sunnypilot 0.9.4.1 with 100% success rates of button sends: fc19648042eb6896|2023-09-05--14-22-53

briskspirit commented 1 year ago

Screenshot from 2023-09-05 13-34-32

Data looks promising so far! In this route bus off state happened on first button spam session, CAN core was reset and continued functioning as normal.

briskspirit commented 1 year ago

merged to master, let me know if something goes wrong

VoltIcaRus commented 1 year ago

@sunnyhaibin @briskspirit

fc19648042eb6896|2023-09-07--16-40-59--3

Used Sunny's test-c3-vw-custom-stock-long

Something is wrong. Could you please take a look at this? I'm not sure because I was in the passenger seat and not driving, but I think the driver pressed the resume button without setting the cruise. As a result, the canvas was disconnected and several warning lights on the vehicle's dashboard cycled on and off. I immediately restarted the vehicle and the problem went away. It may not be related to this issue, I'm not sure. image

For your information, I'm a Sorento HEV and it's not merged, so I'm using it to recognize it as a Sorento PHEV

briskspirit commented 1 year ago

@VoltIcaRus don't see it to be panda related. by logs panda worked fine, but controls were not allowed. This ticket is the wrong place to post.

VoltIcaRus commented 1 year ago

Yes If that's not a problem, great pressed the resume button and the can bus was released and recorded Glad to hear it's not related, thank you for your research!