commaai / openpilot

openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system in 275+ supported cars.
https://comma.ai/openpilot
MIT License
49.81k stars 9.08k forks source link

IsoTpParallelQuery hits total timeout on Mazda #27361

Closed MoreTore closed 1 year ago

MoreTore commented 1 year ago

Describe the bug

Finger printing has been slow when using 9.0 for months. I thought it was just my car but it seems to work fine on my C2 with 0.8.13. I have been putting off reporting this issue. This issue only happens after clearing the cached fingerprint.

Here is the cleaned up tmux output.

Waiting for wayland
wayland ready
adjusting time from '2022-08-31 10:27:49.994168' to '2023-02-16 14:50:10'
Thu 16 Feb 2023 02:50:10 PM UTC
INFO:root:System time valid
fatal: no upstream configured for branch 'new_mazda_C3_stage'
Using Wayland-EGL
gbm_create_device(156): Info: backend name is: msm_drm
qt.qpa.wayland: "wl-shell" is a deprecated shell extension, prefer using "xdg-shell-v6" or "xdg-shell" if supported by the compositor by se
tting the environment variable QT_WAYLAND_SHELL_INTEGRATION
Using the 'wl-shell' shell integration
qt.qpa.wayland: Wayland does not support QWindow::requestActivate()
scons: Reading SConscript files ...
Git commit hash for gitversion.h: dc9270f6
scons: warning: Calling missing SConscript without error is deprecated.
Transition by adding must_exist=False to SConscript calls.
Missing SConscript 'selfdrive/navd/SConscript'
File "/data/openpilot/SConstruct", line 436, in <module>
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.
corrupted double-linked list
selfdrive/loggerd/bootlog.cc: bootlog to /data/media/0/realdata/boot/2023-02-16--08-50-29
fatal: no upstream configured for branch 'new_mazda_C3_stage'
gbm_create_device(156): Info: backend name is: msm_drm
: QPixmap::scaleWidth: Pixmap is a null pixmap
: "wl-shell" is a deprecated shell extension, prefer using "xdg-shell-v6" or "xdg-shell" if supported by the compositor by setting the envi
ronment variable QT_WAYLAND_SHELL_INTEGRATION
: Wayland does not support QWindow::requestActivate()
NVME not mounted
Panda 1b0023001850335332313620 connected, version: DEV-dc9270f6-DEBUG, signature ab79f3c5faf4c5c0, expected ab79f3c5faf4c5c0
selfdrive/boardd/main.cc: starting boardd
selfdrive/boardd/boardd.cc: attempting to connect
selfdrive/boardd/boardd.cc: connected to board
./selfdrive/sensord/sensors/i2c_sensor.h: Reading chip ID failed: -107
./selfdrive/sensord/sensors/i2c_sensor.h: Reading chip ID failed: -107
selfdrive/sensord/sensors/bmx055_magn.cc: Enabling power failed: -107
./selfdrive/sensord/sensors/i2c_sensor.h: Reading chip ID failed: -107
ioctl returned -1 with errno 22
Waiting for CAN messages...
IOCTL_KGSL_DRAWCTXT_CREATE: creating context with flags 0x4001ad2
selfdrive/loggerd/loggerd.cc: logging to /data/media/0/realdata/2023-02-16--08-52-15--0
waiting for modem to come up
ioctl returned -2147482595 with errno 2
dt is 12
selfdrive/locationd/ubloxd.cc: starting ubloxd
child selfdrive.updated got SIGINT
ond torqued controlsd deleter dmonitoringd laikad rawgpsd navd pandad paramsd ubloxd pigeond plannerd radard thermald tombstoned updated up
loader 
loaded model with size: 4443461
Getting VIN & FW versions
dt is 12
ioctl returned -1 with errno 22
model: input_img -> outputs
input product is 1382400
adding index 1: calib
dt is 12
response: ''
IOCTL_KGSL_DRAWCTXT_CREATE: creating context with flags 0x4001ad2
Thneed::clinit done
Thneed::load: loading from models/supercombo.thneed
response: ''
system.micd lagging by 517.31 ms
system.micd lagging by 417.49 ms
system.micd lagging by 317.56 ms
system.micd lagging by 217.62 ms
system.micd lagging by 117.67 ms
system.micd lagging by 17.73 ms
response: ''
Thneed::load: adding input features_buffer with size 50688
Thneed::load: adding input traffic_convention with size 8
Thneed::load: adding input desire with size 3200
Thneed::load: adding input big_input_imgs with size 1572864
Thneed::load: adding input input_imgs with size 1572864
Thneed::save: adding output with size 24432
selfdrive/modeld/modeld.cc: models loaded, modeld starting
response: ''
response: ''
quectel setup done
no almanac backup found
Sending current time to ublox
Pigeon GPS on!
Starting listener for: camerad
selfdrive/loggerd/encoderd.cc: encoder fcamera.hevc init 1928x1208
selfdrive/loggerd/encoderd.cc: encoder dcamera.hevc init 1928x1208
selfdrive/loggerd/encoderd.cc: encoder ecamera.hevc init 1928x1208
selfdrive/modeld/dmonitoringmodeld.cc: connected with buffer size: 4804608
selfdrive/modeld/modeld.cc: connected main cam with buffer size: 4804608 (1928 x 1208)
selfdrive/modeld/modeld.cc: connected extra cam with buffer size: 4804608 (1928 x 1208)
selfdrive/modeld/modeld.cc: vipc_client_main no frame
iso-tp query bad response: (2017, None) - 0x7f2231
iso-tp query bad response: (1988, None) - 0x7f2231
selfdrive/modeld/modeld.cc: skipping model eval. Dropped -1 frames
iso-tp query bad response: (2016, None) - 0x7f2231
iso-tp query bad response: (2017, None) - 0x7f2231
iso-tp query bad response: (1799, None) - 0x7f1012
iso-tp query bad response: (1856, None) - 0x7f1012
ond torqued controlsd deleter dmonitoringd laikacamerad clocksd logcatd proclogd logmessaged micd timezoned dmonitoringmodeld encoderd logg
erd modeld sensord ui soundd locationd calibrationd torqued controlsd deleter dmonitoringd laikad rawgpsd navd pandad paramsd ubloxd pigeon
d plannerd radard thermald tombstoned uploader 
selfdrive/loggerd/loggerd.cc: rotated to /data/media/0/realdata/2023-02-16--08-52-15--1
iso-tp query timeout while receiving data
Error processing UDS response: (1882, None)
Traceback (most recent call last):
  File "/data/openpilot/selfdrive/car/isotp_parallel_query.py", line 108, in get_data
    dat, updated = msg.recv()
  File "/data/openpilot/panda/python/uds.py", line 445, in recv
    self._isotp_rx_next(msg)
  File "/data/openpilot/panda/python/uds.py", line 488, in _isotp_rx_nextA
    assert self.rx_idx & 0xF == rx_data[0] & 0xF, "isotp - rx: invalid consecutive frame index"
AssertionError: isotp - rx: invalid consecutive frame index
Error processing UDS response: (1882, None)
Traceback (most recent call last):
  File "/data/openpilot/selfdrive/car/isotp_parallel_query.py", line 108, in get_data
    dat, updated = msg.recv()
  File "/data/openpilot/panda/python/uds.py", line 445, in recv
    self._isotp_rx_next(msg)
  File "/data/openpilot/panda/python/uds.py", line 488, in _isotp_rx_next
    assert self.rx_idx & 0xF == rx_data[0] & 0xF, "isotp - rx: invalid consecutive frame index"
AssertionError: isotp - rx: invalid consecutive frame index
iso-tp query bad response: (2016, None) - 0x7f2231
iso-tp query bad response: (2017, None) - 0x7f2231
iso-tp query bad response: (2016, None) - 0x7f2231
iso-tp query bad response: (2017, None) - 0x7f2231
iso-tp query bad response: (2017, None) - 0x7f2231
iso-tp query bad response: (2016, None) - 0x7f2231
iso-tp query bad response: (2016, None) - 0x7f2231
iso-tp query bad response: (2017, None) - 0x7f2231
iso-tp query bad response: (1840, None) - 0x7f2231
iso-tp query bad response: (2016, None) - 0x7f3e13
iso-tp query bad response: (2016, None) - 0x7f2231
Fingerprinted MAZDA 3 2019 using fuzzy match. 2 matching ECUs
Fingerprinted MAZDA 3 2019 using fuzzy match. 3 matching ECUs
VIN JM1BPBML7M1302695
selfdrive/boardd/boardd.cc: finished FW query at OBD port
selfdrive/boardd/boardd.cc: waiting for params to set safety model
{"event": "fingerprinted", "car_fingerprint": "MAZDA 3 2019", "source": 1, "fuzzy": true, "cached": false, "fw_count": 9, "ecu_responses":
[[1848, null, 1], [1864, null, 1], [1807, null, 1], [1848, null, 0], [1896, null, 1], [1900, null, 1], [2024, null, 0], [2024, null, 1], [1
996, null, 1], [1896, null, 0], [2025, null, 1], [1900, null, 0], [2025, null, 0]], "vin_rx_addr": 2024, "error": true}
selfdrive/boardd/boardd.cc: got 1392 bytes CarParams
selfdrive/boardd/boardd.cc: panda 0: setting safety model: 29, param: 0, alternative experience: 1
ond torqued controlsd deleter dmonitoringd laikacamerad clocksd logcatd proclogd logmessaged micd timezoned dmonitoringmodeld encoderd logg
erd modeld sensord ui soundd locationd calibrationd torqued controlsd deleter dmonitoringd laikad rawgpsd navd pandad paramsd ubloxd pigeon
d plannerd radard thermald tombstoned uploader 
selfdrive/loggerd/loggerd.cc: rotated to /data/media/0/realdata/2023-02-16--08-52-15--6
selfdrive/loggerd/loggerd.cc: rotated to /data/media/0/realdata/2023-02-16--08-52-15--7
selfdrive/loggerd/loggerd.cc: closing logger
child system.micd got SIGINT
caching torque params
child selfdrive.locationd.calibrationd got SIGINT
caught sig 2, disabling quectel gps
child selfdrive.navd.navd got SIGINT
child selfdrive.locationd.laikad got SIGINT
SIGINT received, exiting
ioctl returned -1 with errno 1
ioctl returned -2147482611 with errno 1
ioctl returned -2147482611 with errno 1
ioctl returned -2147482611 with errno 1
ioctl returned -2147482611 with errno 1
ioctl returned -2147482611 with errno 1
Storing almanac in ublox flash
child selfdrive.controls.radard got SIGINT
Stopping listener for: camerad
loader 
response: ''
response: ''
quectel cleanup done
No response from ublox
child selfdrive.monitoring.dmonitoringd got SIGINT
child selfdrive.locationd.paramsd got SIGINT
child selfdrive.controls.plannerd got SIGINT

Provide a route where the issue occurs

164080f7933651c4|2023-02-16--08-52-15--0

openpilot version

0.9.0.x

Additional info

No response

jyoung8607 commented 1 year ago

This is probably similar to #25398 except the unfinished query (the one with the invalid consecutive frame index) is stalled at a different layer of the UDS/FP stack.

Can you identify which query is failing at that stage? Run fw_versions.py --debug with openpilot shutdown to watch the process. Is it critical to the Mazda fingerprint? I think there were some ISO-TP/UDS timing compliance fixes (mainly for HKG) that might be of value for your multipart response issue, I can't recall if those fixes are part of 0.9.0.

sshane commented 1 year ago

Firstly this is a fork, and this car is unsupported in master currently. But it looks like your car is fuzzy matching, not exact matching with its FW.

We sort the brands to query FW versions by likelihood (based on present ECUs) and we break out of the loop whenever we get an exact match. If the car only fuzzy matches, it will run through everything for the chance that the second likeliest brand is the correct brand, and we get an exact match there.

Your fwdCamera ECU is missing on your recent routes, I'd check why that is.

sshane commented 1 year ago

Going to re-open until I check the timeouts to make sure they're working properly. @MoreTore Can you also connect your device to WiFi to upload logs? https://github.com/commaai/openpilot/pull/25897 may be related

MoreTore commented 1 year ago

So the can gateway is not on the camera, its on the BCM. Not sure if that matters since its using the OBD port. But anyways 164080f7933651c4|2023-02-16--08-52-15 should be uploaded.

adeebshihadeh commented 1 year ago

@sshane can we close this?

MoreTore commented 1 year ago

yea this is my fault