Parsing errors - Githubissues

Uight commented 2 months ago

while checking multiline support i ran through all dbc files form https://github.com/commaai/opendbc theres some Bu_ Parsing errors but a issue for thats already created. I removed duplicate errors that occur for the same reason. And also stuff like missing signals that actually dont exist in there or actual duplicates.

I also have to mention that on many of these i have no idea if the error is valid:

acura_ilx_2016_can_generated.dbc: line 153 und 154. Code is:

CM SG 304 "Seems to be platform-agnostic"; CM SG 316 "Should exist on Nidec";

Missing the Signal name. probably invalid;

acura_ilx_2016_nidec.dbc: line 180 and 181;

BO_TXBU 768 : NEO,ADAS; BO_TXBU 769 : NEO,ADAS;

We dont support seperate define trasceivers yet. I dont know any case where thats relevant

bmw_e9x_e8x.dbc'

CM SG 408 ParkButtonSecond "Redundant buton. 0x1 = pressed;"; CM SG 408 SideButton "0x1 = pressed;"; CM SG 408 SportButtonPressed "0x1 = pressed;";

Something i feared with multiline support. Seemingly ";" can appear in a message. actually adding comment to the signal and then failing at a ";" only line.

chrysler_pacifica_2017_hybrid_generated.dbc

VAL_ 320 ACC_OFF_REQ 2 "PERMANENT" 1 "TEMPORARY" 0 "NONE"

Val table without ; error?

\chrysler_ram_dt_generated.dbc

CM_ "Generated from _stellantis_common.dbc"

Comment without ";" error?

FORD_CADS.dbc

BADEF SG_ "GenSigMissingSourceValue" INT 0 1e+09;

the 1e+09 style is unsupported

ford_lincoln_base_pt.dbc

VALTABLE DcacRdy_D_Stat 7 "NotUsed_2" 6 "NotUsed_1" 5 "Faulted" 4 "ProtectionTempearture" 3 "ProtectionOverload" 2 "ProtectionGfci" 1 "Active" 0 "Idle";

Literally every single val table fails parsing in this file. I suspect the same error that @Whitehouse112 fixed for the Val_Parsing. Maybe want to look into this too?

gm_global_a_powertrain_generated.dbc

VAL_ 501 ManualMode 1 "Active" 0 "Inactive"

";" missing at or near end of file; Error? (near in mazda_2017.dbc)

mazda_2017.dbc

BO_ 1277 NEW_MSG10: 8 XXX SG NEW_SIGNAL3 : 31|8@0+ (1,0) [0|255] "" XXX SG NEW_SIGNAL2 : 15|8@0+ (1,0) [0|255] "" XXX SG counter : 7|8@0+ (1,0) [0|255] "" XXX

BO_ 1275 20175: 8 XXX SG counter : 4|5@0+ (1,0) [0|255] "" XXX

I suspect the fact that the message only contains numbers failing the regex?

tesla_radar_bosch_generated.dbc

CM BO 345 "Message sent every 20ms. Checksum : Sum of all first bytes + 0xc; place checksum in 4th octet";

Another ";" in a comment but worse as its not even directly followed by "

vw_mqb_2010.dbc

CM SG 1720 KBI_Reifenumfang_Sekundaer "Fahrzeuge mit unterschiedlichen Reifenumfängen Vorderachse / Hinterachse:

Primärachse: KBI_Reifenumfang Sekundärachse: KBI_Reifenumfang_Sekundaer ";

Empty line in comment allowed?

Uight commented 2 months ago

things that should probably be fixed:

the comment fails are mostly due to trying generic multiline support which doesnt realy work for comments. Are all comments ended by "; or atleast by " ; then this could still be done seperatly in the comment parsing; If not im not to sure about multiline support because currently the comment stuff works now it doesnt but other stuff has multiline support; On the other hand failing to parse comments is better than failing to parse other entries.

BO_TXBU 769 : NEO,ADAS;

Can do at some point before the 2.0 release i guess would be best. Should probably just be added to the messages Transmitters? Duplication handling?

BADEF SG_ "GenSigMissingSourceValue" INT 0 1e+09;

Would need a fancy regex change that i cant do. neither can the AI Overlord ;)

VALTABLE DcacRdy_D_Stat 7 "NotUsed_2" 6 "NotUsed_1" 5 "Faulted" 4 "ProtectionTempearture" 3 "ProtectionOverload" 2 "ProtectionGfci" 1 "Active" 0 "Idle";

@Whitehouse112 could probably adjust the regex for this; Just like in the VAL_ parsing?

BO_ 1277 NEW_MSG10: 8 XXX SG NEW_SIGNAL3 : 31|8@0+ (1,0) [0|255] "" XXX SG NEW_SIGNAL2 : 15|8@0+ (1,0) [0|255] "" XXX SG counter : 7|8@0+ (1,0) [0|255] "" XXX

BO_ 1275 20175: 8 XXX SG counter : 4|5@0+ (1,0) [0|255] "" XXX

Also a fancy regex change? This group ([a-zA-Z_][\w]*) probably expects atleast on letter. Not to sure though as i dont know why you cant just use [\w] alone. I do think that the name here is allowed. isnt it?

Adhara3 commented 2 months ago

Hi,

It's Saturday so I'll quickly scratch the surface here.

On the other hand failing to parse comments is better than failing to parse other entries.

We may discuss about this but for us that would be a regression.

; mid comment

I told you that multiline thing was not only a matter of finding ;.... :)

1e+09

I'm sure I saw a unit test yesterday where we were testing something very similar

Generally speaking, finding complete and trustable DBC specification has been a mess so we followed the specifications you can find in project's home page. For some tags that spec states that final ; I'd compulsory. There are also clear limitations on message names and other identifiers. We, nevertheless, often double checked opening DBC with Vector CANdb++. Sometimes it was strict, sometimes it was loose and accepted minor syntax errors.

it may be worth checking all these errors with both documentation and CANdb++ and see what happens. I won't assume that those DBC are all strictly formatted.

Since our observed is named Strict we could also think about making behavior configurable, something like parsing mode: strict, loose, mute (where mute is current Silent observer).

A

Uight commented 2 months ago

@Adhara3 there are test for sientificNotation but only for FLOAT and it only works for float. Im not sure if it is allowed for int. In this case it might since its only e+. Im not sure. Spec is unclear.

for the comment stuff thats why i said multiline support is a poc. I can let the nextline provider handle general multiline/multidefiniton per line support and handle comments specially still as before. im just not sure if i should. i could drop multidefinition per line support but thats also uncool. or if i see a ; in line i check for keyword after that but ill have to experiment some more. The basic dilema is that the NextLine provider can not really have the info if a ; should be the splitter between two definitions in one line or just a ; like in a comment. The most probable solution atm would be not to split a line if the part behind ; is not a new keyword. but that would still leaf the empty line problem. How would i do it if i had no code base: let each parser itself decide when its wants to stop reading and drop the newLineProvider

I know that many dbcs are flawed thats why i picked the points that in my opinions atleast seem to be reasonable and within spec in my second comment. I think these should all be within spec. im unable to verify with CANdb++ though.

Adhara3 commented 2 months ago

I suspect the fact that the message only contains numbers failing the regex?

C_identifier: a valid C_identifier. C_identifiers have to start with an alpha character or an underscore and may further consist of alpha-numeric characters and underscores. C_identifier = (alphachar | '') {alpha_numchar | ''} DBC_identifier: a C_identifier which doesn’t represent a DBC-Keyword.

From the spec an indentifier should at least start with letter or _. Is this rule followed? No idea. I'll check with candb

A

Uight commented 2 months ago

Was just thinking regarding to this:

Since our observed is named Strict we could also think about making behavior configurable, something like parsing mode: strict, loose, mute (where mute is current Silent observer).

I was first thingik you wanna swap the parser but i dont like that. however would it be possible to classify the pasing errors like a comment parsing error being => Info; a single for a val table not found like a warning and a signal or message error being critical. Kind of the same direction as your idea but might be a bit easier

Uight commented 2 months ago

@Adhara3 we checked the current code against some production dbc (so ones that are used in production systems and are parsed without error by the softwares we use. From the parsing errors we observed i can say that only 2 things are currently an actual problem.

stuff like this appears sometimes:

BO_TXBU 769 : NEO,ADAS;

Let me know if i should create a PR for this for 1.7 or later or not at all?

Stuff like this is really common: VALTABLE DcacRdy_D_Stat 7 "NotUsed_2" 6 "NotUsed_1" 5 "Faulted" 4 "ProtectionTempearture" 3 "ProtectionOverload" 2 "ProtectionGfci" 1 "Active" 0 "Idle";

As i have no idea how to adjust the regex as whitehouse did in i would appreciate it if someone couuld fix this.

The testcase for it would be an adjustment of "ValueTableDefinitionIsParsedAndCallsBuilder" where you just remove the trailing space before the ';'

I actually found one single comment in our files with a line ending in ';'. No big deal though even tough it fails parsing.

As for the message with message name only with numbers and '_'. two of three parsers do parse it but it never appears in the production files so doesnt matter.

BADEF SG_ "GenSigMissingSourceValue" INT 0 1e+09; Seems to be fully valid. parsed by all parsers. "Matlab, Goepel, python cantools, " But i dont need it as it doesnt appear in any of our files

Adhara3 commented 2 months ago

I created a couple of issues.

I actually found one single comment in our files with a line ending in ';'. No big deal though even tough it fails parsing.

If we go down this path the we should ignore de doc/spec at all and switch parsing target to parse what other parsers parse or provide mechanism to not be strict but keeping a strict mode. If a parser is parsing a wrong syntax it is not doing a good job for me, it is allowing bad habits. I'm fine to allow loose parsing but it should be a user choice, we should be strict when possible. My 2 cents.

2. Stuff like this is really common: VALTABLE DcacRdy_D_Stat 7 "NotUsed_2" 6 "NotUsed_1" 5 "Faulted" 4 "ProtectionTempearture" 3 "ProtectionOverload" 2 "ProtectionGfci" 1 "Active" 0 "Idle";

Yeah, understood. No clue how to solve it. I have some ideas, but it won't be quick, sorry.

Cheers A

Uight commented 2 months ago

If we go down this path the we should ignore de doc/spec at all and switch parsing target to parse what other parsers parse or provide mechanism to not be strict but keeping a strict mode. If a parser is parsing a wrong syntax it is not doing a good job for me, it is allowing bad habits. I'm fine to allow loose parsing but it should be a user choice, we should be strict when possible. My 2 cents.

As far as i understand the spec literally and char can appear in a comment as long as it is in between "". This includes ; and also at line ends. the actual comment end is something like this (regex style): "\s*; so " markes the end of the comment text then it can have zero or more spaces before a final; Just because ; is at a line end doesnt neccesarily mean the comment ends. But anyway this is such an edge case...

Adhara3 commented 2 months ago

If we go down this path the we should ignore de doc/spec at all and switch parsing target to parse what other parsers parse or provide mechanism to not be strict but keeping a strict mode. If a parser is parsing a wrong syntax it is not doing a good job for me, it is allowing bad habits. I'm fine to allow loose parsing but it should be a user choice, we should be strict when possible. My 2 cents.

As far as i understand the spec literally and char can appear in a comment as long as it is in between "". This includes ; and also at line ends. the actual comment end is something like this (regex style): "\s*; so " markes the end of the comment text then it can have zero or more spaces before a final; Just because ; is at a line end doesnt neccesarily mean the comment ends. But anyway this is such an edge case...

Sorry I misunderstood the issue. in CANdb++ both VAL_TABLE_ and CM_ must end with a ;. In case of CM_ the ; must follow the body closing " but this is important for supporting multiline *the fact that we can assume that the final ; is there otherwise we fail). The comment body text having a ; or event having a line ending with ; must be supported. Maybe it's rare, but must be supported anyway.

Anyway, I'm starting to have some ideas, I need to write code and see. Cheers

Uight commented 2 months ago

Anyway, I'm starting to have some ideas, I need to write code and see.

You can still check out my multiline code but it doesnt go for line ends but instead line starts. but it works. If we keep the current parser we could change comment to look for "; for line end instead of only ;. that would fix that.

Adhara3 commented 2 months ago

Anyway, I'm starting to have some ideas, I need to write code and see.

You can still check out my multiline code but it doesnt go for line ends but instead line starts. but it works. If we keep the current parser we could change comment to look for "; for line end instead of only ;. that would fix that.

I truly believe we should abandon the concept of INextLineProvider. The parser should not be involved in anthing related to line Anyway, I need to write down my design and see if I really like the design, we'll see. Let me think about it, then I'll list here 2 or 3 otpions I have in mind.

Cheers

Adhara3 commented 2 months ago

Dear all,

I close this issue to avoid noise as the two main problems highlighted here are being addressed in the scope other issues (some completed, some ongoing).

Thanks for your effort A

EFeru / DbcParser

Parsing errors #82