ebroecker / canmatrix

Converting Can (Controller Area Network) Database Formats .arxml .dbc .dbf .kcd ...
BSD 2-Clause "Simplified" License
933 stars 401 forks source link

Many errors are reported when parsing the dbc with Chinese characters and special characters #737

Closed Liluoquan closed 6 months ago

Liluoquan commented 12 months ago

When I use canmatrix to load DBC with signals containing Chinese characters and special characters, like: _matrix = canmatrix.formats.dbc.load(f, dbcImportEncoding=encoding) errors reported like this:

error with line no: 2004
b' SG_ PSDCU_RR\xe4\xb8\xbb\xe8\xbd\xaf\xe4\xbb\xb6\xe7\x89\x88\xe6\x9c\xac\xe5\x8f\xb7$_W : 63|8@0+(1,0)[0|255] "" Vector__XXX\r\n'

the original line like this: SG_ 冗余制动降级状态$_W : 23|3@0+(1,0)[0|7] "" Vector__XXX then I find canmatrix use regex to match each line in the dbc, it uses the following regex when processing lines starting with'SG': `pattern = r"^SG +(\w+) : (\d+)|(\d+)@(\d+)([+|-]) (([0-9.+-eE]+), ([0-9.+-eE]+)) [([0-9.+-eE]+)|([0-9.+-eE]+)] +\"(.)\" +(.)" **regex group(\w+)cannot match Chinese characters or special characters in python3.8, so I suggest to change the regex above into:** pattern = r"^SG_ +(\S+) : (\d+)|(\d+)@(\d+)([+|-]) (([0-9.+-eE]+), ([0-9.+-eE]+)) [([0-9.+-eE]+)|([0-9.+-eE]+)] +\"(.)\" +(.)"` To adapt to the scenarios mentioned in the issue. Please reply, it's very important to me!

ebroecker commented 12 months ago

Hi @Liluoquan

you have to specify the encoding "dbcImportEncoding".

maybe something like dbcImportEncoding="utf8"

ebroecker commented 11 months ago

Hi @Liluoquan

any success?

Liluoquan commented 11 months ago

Hi @ebroecker sorry, it didn't work when i use utf-8, GB2312 or gbk: _matrix = canmatrix.formats.dbc.load(f, dbcImportEncoding='utf-8') The error is as follows:

error with line no: 28
b' SG_ \xca\xfd\xd7\xd6\xd6\xa4\xca\xe9\xb4\xe6\xb4\xa2\xb9\xca\xd5\xcf$_W : 20|1@0+(1,0)[0|1] "" Vector__XXX\r\n'
ebroecker commented 11 months ago

Hi @Liluoquan

I did not read your issue completely the fist time - sorry.

You already provided a potential fix. Thanks for it! I'll add your provided fix soon.