commaai / opendbc

democratize access to car decoder rings
MIT License
1.81k stars 1.07k forks source link

CANParser: 2x faster parsing #1039

Closed deanlee closed 1 month ago

deanlee commented 2 months ago

Issue: The following two code blocks consume nearly two-thirds of cython's update_string()'s runtime:

for v in self.vl_all.values():
  for l in v.values():  # no-cython-lint
    l.clear()
while it != new_vals.end():
  cv = &deref(it)
  # Cast char * directly to unicode
  cv_name = <unicode>cv.name
  self.vl[cv.address][cv_name] = cv.value
  self.vl_all[cv.address][cv_name] = cv.all_values
  self.ts_nanos[cv.address][cv_name] = cv.ts_nanos
  updated_addrs.insert(cv.address)
  preinc(it)

Resolve:

  1. Clear only the address key's value instead of all key-value pairs, as the name key's value refers to the address. This change halves the execution time of the first block of code.
  2. Since the vector return from CanParser::update_strings() groups data by address, caching dictionary references for each address key reduces redundant lookups. This simple adjustment significantly cuts execution time by at least half, often more.
  3. disable wraparound and boundcheck, update_string does not require these two options. disabling them can slightly improve performance

the results of running selfdrive/debug/check_can_parser_performance.py :

Before:

6000 CAN packets, 10 runs 283.51 mean ms, 297.98 max ms, 275.58 min ms, 5.45 std ms 0.0473 mean ms / CAN packet

After:

6000 CAN packets, 10 runs 128.54 mean ms, 148.90 max ms, 119.80 min ms, 9.34 std ms 0.0214 mean ms / CAN packet

sshane commented 1 month ago

Since the vector return from CanParser::update_strings() groups data by address, caching dictionary references for each address key reduces redundant lookups. This simple adjustment significantly cuts execution time by at least half, often more.

Perhaps we refactor the C++ CAN Parser to structure the signals together?

disable wraparound and boundcheck, update_string does not require these two options. disabling them can slightly improve performance

This did not seem to make much of a difference for me on PC nor device. master: 750 mean ms, disabled checks: 760 mean ms


master:

6000 CAN packets, 20 runs
751.04 mean ms, 762.70 max ms, 733.57 min ms, 12.36 std ms
0.1252 mean ms / CAN packet

This branch before my commits:

6000 CAN packets, 20 runs
296.84 mean ms, 299.24 max ms, 290.97 min ms, 1.70 std ms
0.0495 mean ms / CAN packet

This branch w/ clean ups:

6000 CAN packets, 20 runs
293.75 mean ms, 295.81 max ms, 289.66 min ms, 1.42 std ms
0.0490 mean ms / CAN packet