SPF-OST / pytrnsys

Package that provides functionality to run and process, plot and report TRNSYS simulations
https://pytrnsys.readthedocs.io
GNU General Public License v3.0
11 stars 8 forks source link

Failure to parse `ddck` file. #176

Open zuckerruebe opened 9 months ago

zuckerruebe commented 9 months ago

As a user I'd like pytrnsys to correctly parse attached ddck file and replace :unitAssigned correctly.

Background TRNSYS' deck grammar is context-sensitive. The LABELS "directive" is a prime example for that. The following is correct "TRNSYS syntax"

UNIT 1 TYPE 2
INPUTS 1
0 0
LABELS 6
HELLO
EQUATIONS 1
x = 7

resulting in labels EQUATIONS, 1, x, = and 7.

Compare this to

UNIT 1 TYPE 2
INPUTS 1
0 0
LABELS 1
HELLO
EQUATIONS 1
x = 7

which would result in the equation x=7. However, it's not possible to write an unambiguous context-free grammar for parsing the above two versions correctly, as the correct parse depends on the context "the number of labels expected" (1 or 6, here).

Most grammar compilers out there are for context free grammars - the library for parsing ddcks which we're using, lark, definitely is - and most programming languages have a context free grammar. More precisely, lark can accept ambiguous grammars and will then try and generate all possible parses of the input and finally select the most "probable" one as the parse it returns to the user. I don't know how it deems one parse more "probable" than another. We should think of this as an implementation detail.

That's what makes this bug very insidious. Just by changing the version of the lark package we depend upon, the user can get different results. Also rearranging stuff in the ddck file can lead to different results - moving the UNIT with LABELS to the very end of the file might save the day for the example attached. So things are very brittle - and slow, on top of that. Being slow is also a result of lark having to generate all possible parses of the input.

Possible remedies

  1. Write your own context-sensitive parser (started here but very early days)
  2. Extend the lark grammar to not allow "keywords" like EQUATIONS for LABELS, etc.

1 Would have the advantage of accuracy and speed and the disadvantage of having to do quite some work. 2 Would be less work, still potentially not very fast and may not ultimately be the most flexible way.

Only 1. would be able to ensure the right number of equations, labels, inputs etc. at parse stage. This is not possible with approach 2.

zuckerruebe commented 9 months ago

@ahobeost FYI