Roald87 / TcBlack

Opnionated code formatter for TwinCAT.
MIT License
103 stars 11 forks source link

Regex for variable definition fails on string which includes semi colon or closing bracket #35

Closed kdorsel closed 4 years ago

kdorsel commented 4 years ago

Losing the initialization portion if there's a semicolon inside the string image

My initial idea was negative lookahead in the Tokenize method. https://github.com/Roald87/TcBlack/blob/8440ab05918b38b5ee59a99f3f01ba8b9237dab7/src/TcBlack/VariableDeclaration.cs#L77

(?:;(?!.*;)) This negative lookahead would fail if there's a semi colon in the comment portion of the line also.

But that gives me even weirder results with the assignment operator now showing up. image

Roald87 commented 4 years ago

Thanks for reporting! I earlier already noticed strange behavior when there is a closing bracket in the string ')'. You can see the failing ones and a few working ones here https://regex101.com/r/HN9oLJ/3

I tried the following to fix it, but it fails on different ones:

  1. Replace \w+\(.*\) with \w+\([^)]\) then WriteContents: STRING(1) := ')'; passes, but fbSample : FB_Sample(nId_Init := 11, fIn_Init := 33.44) ; fails https://regex101.com/r/HN9oLJ/6.
  2. Replace \w+\(.*\) with \w+\(.?\) then WriteContents: STRING(1) := ')'; passes, but fbSample : FB_Sample(nId_Init := 11, fIn_Init := 33.44) ; also fails https://regex101.com/r/HN9oLJ/7.

I also tried with your negative lookahead but that also doesn't lead to the desired result https://regex101.com/r/HN9oLJ/4

It's a though one.

kdorsel commented 4 years ago

Ok, this was an interesting one!

This seems to work. Add a negative look ahead/behind for the quotes (?<!['"]) and (?!["']) and also using the negative lookahead for the semi colon (?:;(?!.*;)). This will still fail is there's a semi colon in the comment.

I also removed the (?s) single line modifier. I'm not too sure the purpose of this one... But if needed the negative look ahead just needs to be modified to not match new lines with the dot. https://regex101.com/r/HN9oLJ/8

This is my attempt to solve the semi colon in the comment, but still needs some work... https://regex101.com/r/HN9oLJ/9

Any other edge cases will surely pop up as needed 😆

Roald87 commented 4 years ago

Yeah its a tricky one 😁 . That's why the tests are so convenient. It often happens that a small change of the regex pattern can have very large consequences for other variable declarations.

What I usually do is:

  1. Add a failing test and run it.
  2. Use regex101.com and/or https://www.debuggex.com/ to find the right pattern with a few examples
  3. Run the tests again. If it fails go to two. Else 🥳
kdorsel commented 4 years ago

I think this might be easier to do in two steps. The first step would be to check for a string like initializing. Either ' or ". If found remove it and apply the current regex to part it out. The string check would have to make sure that any FB initializing with string values don't get caught.

Because the major issue I see right now is supporting those special characters inside a string. ) and ;. https://regex101.com/r/EzvPSi/2

Roald87 commented 4 years ago

That seems like a good solution. Or else the regex will become very complex.

kdorsel commented 4 years ago

Ok, I'll look at the two step solution and open up a PR.