ebroecker / canmatrix

Converting Can (Controller Area Network) Database Formats .arxml .dbc .dbf .kcd ...
BSD 2-Clause "Simplified" License
924 stars 399 forks source link

Escaped double quotes in DBC char string #526

Closed jackm closed 3 years ago

jackm commented 3 years ago

Currently canmatrix does not handle escaped double quotes in a DBC file character string.

For example, in a signal value description: VAL_ 291 Signal 0 "zero" 1 "one " 2 "string with \"escaped\" double quotes";

DBC parsing by canmatrix simply splits the line string into tokens by each " char, which would result in a list of strings like this: ['0 ', 'zero', ' 1 ', 'one', ' 2 ', 'string with \\', 'escaped\\', ' double quotes']

I am unsure if having escaped double quotes in a DBC character string is allowed at all according to the proprietary spec, but I know that other programs that parse DBC files do allow it and handle it appropriately. I have not tried using Vector CABdb++ to see how it handles escaped double quotes yet.

jackm commented 3 years ago

Confirmed that loading a DBC file in CANdb++ with escaped double quotes in a character string does work and CANdb++ does not show any errors when opening the file. The escaped quotes appear in the value.

ebroecker commented 3 years ago

thanks @jackm could you please try branch https://github.com/ebroecker/canmatrix/tree/iss526 ?

pip install git+https://github.com/ebroecker/canmatrix.git@iss526

jackm commented 3 years ago

It splits the tokens correctly now.

I noticed that it seems to add an additional backslash though. For example if you run the canconvert script and have a DBC file with escaped double quotes in it as the input and a DBC file as the output format, it will be identical except that each backslash escaping a double quote is repeated in the output file.

$ cat a.dbc
VERSION "created by canmatrix"

NS_ :

BS_:

BU_: 

BO_ 17 Frame_1: 8 Vector__XXX
 SG_ Signal : 0|8@1- (1,0) [0|0] "" Vector__XXX

VAL_ 17 Signal 0 "zero" 1 "one " 2 "string with \"escaped\" double quotes";
$ canconvert a.dbc b.dbc
arxml is not supported
kcd is not supported
fibex is not supported
xls is not supported
xlsx is not supported
yaml is not supported
INFO - convert - Importing a.dbc ... 
INFO - convert - done

INFO - convert - Exporting b.dbc ... 
INFO - convert - 
INFO - convert - 1 Frames found
INFO - convert - done
$ diff a.dbc b.dbc 
22c22
< VAL_ 17 Signal 0 "zero" 1 "one " 2 "string with \"escaped\" double quotes";
---
> VAL_ 17 Signal 0 "zero" 1 "one " 2 "string with \\"escaped\\" double quotes";
ebroecker commented 3 years ago

ok, thanks for your feedback, seams there's another bug....

ebroecker commented 3 years ago

OK, could you have another look? I may have fixed this also now

jackm commented 3 years ago

I believe it is good now; it will correctly tokenize the string leaving escaped quotes untouched and also treat double backslashes as a literal backslash.

It may be a good idea to use raw strings in your test cases to make it clearer that you want literal backslashes in the string.

ebroecker commented 3 years ago

you are right, good Idea, I'll refactor to raw strings.