Critical Bug when Parsing Floating Point Numbers

MerlinofMines / EasyCommands

Github Repository for Ingame Scripts built by MerlinofMines. Uses MDK to Deploy to SpaceEngineerse

GNU General Public License v3.0

8 stars 3 forks source link

Critical Bug when Parsing Floating Point Numbers #170

Closed jgersti closed 2 years ago

jgersti commented 2 years ago

When tokenizing a floating point number which is directly preceded or followed by an character from separateTokensSecondPass the floating point number is decomposed. Print 2.0+1 is tokenized as Print "2" Dot "0" Add "1".

In #169 this situation is even worse since the component delimiter : is tokenized in the second pass. This means a vector containing variables and floating point numbers cannot be defined and the dot product between two such vectors inside an the success branch of an ternary condition might be ambiguous. Print pi:0.5:1is tokenized as Print "pi" : "0" Dot "5" : "1"

MerlinofMines commented 2 years ago

Indeed, this one is a bit of a doozy looks like.

Let me give this one some thought. This sort of hints at the 2 pass approach being insufficient.

Perhaps we need to introduce parsing rules, with priority ordering, similar to how the parameter processor & engine work. Doesn't need to be near as complicated though.

Perhaps a delegate Processor taking List and returning List.

First phase pulls out firstPassTokens Second phase can look for floating point numbers Third phase can parse other primitive types fourth phase can parse secondPassTokens

That approach would let us add more parsing rules iteratively as needed. Hopefully can be done without too many characters. Since only one line is parsed per tick (and is processed in a different tick) i don't think the additional steps for processing the tokens will cause script parsing complexity issues, but guess we'll see.

Thoughts?

jgersti commented 2 years ago

My suggestion to keep changes pretty minimal is to split the tokenizing into the following steps:

tokenize quoted substrings
seperate firstPassTokens and tokenize matches
match and tokenize primitives (last chance to match primitive vectors and floating point numbers with an explicit sign)
seperate <, >, =, &, |, ?, :, +, - and tokenize matches
match and tokenize primitives (last chance to match floating point numbers)
seperate on . and tokenize matches
tokenize the rest

edit: + and - need to also be seperated in step 4.

Currently the following steps are performed:

tokenize quoted substrings
seperate firstPassTokens and tokenize matches
match and tokenize primitives
seperate secondPassTokens
tokenize the rest

As a site note: maybe call these functions TokenizeXXX since that is what they actually do.

MerlinofMines commented 2 years ago

Ya I agree the . would need to be separated last, after checking for primitives.

If we remove Vector parsing of floats from parse primitive (which I think we should do) then this simplifies even more, as you don't need to do the primitive tokenization in step 3.

Tokenize quoted substrings
separate firstpassTokens
separate secondPass tokens (other than .)
parse primitives
separate .

Thoughts?

Tokenize seems fine as a name.