fesch / Structorizer.Desktop

Structorizer is a little tool which you can use to create Nassi-Schneiderman Diagrams (NSD).
https://structorizer.fisch.lu
GNU General Public License v3.0
65 stars 20 forks source link

Code import failure on string literals containing tab characters #1151

Closed codemanyak closed 8 months ago

codemanyak commented 8 months ago

Java import fails if the source file contains string literals where // occurs as substring:

error.lexical in file "D:\SW-Produkte\Structorizer\tests\Issue1151_Java_import_mystic_error\PHPGenerator_2.java"

Preceding source context:
    1:   public class Test1151 
    2:   {
    3:   
    4:       protected String getInputReplacer(boolean withPrompt)
    5:       {
    6:           return » "$2 = \\$_REQUEST[$1];    // TODO form a sensible input opportunity";

Found token (Error) (")

Expected: 

The same happens if the string literal contains /*, no matter whether or not */ follows within the string literal. Obviously, the comment detection does not correctly work with the GOLD engine used in Structorizer. Interestingly, the GOLDBuilder successfully parses the very same files using the very same grammar. Hence, it is possible that the Structorizer hack to associate comments to Productions during the parsing is to be blamed for the bug.

codemanyak commented 8 months ago

It turned out that the Structorizer hack does not cause the problem. It occurs even if the hack is disabled. This means, it's in the engine itself (which was derived by Ralph Iden from some open source version 5.0.0, whereas the GOLDBuilder represents a version 5.2.0 for which there is no source code publicly available). But the condition for the occurrence of the parsing failure is somewhat more complicated: It requires also a tabulator in the string literal before the comment symbol to raise the error! And then it applies to C sources as well, of course. (Pascal import is not affected, there tab characters are completely effaced from string literals - which isn't desirable, either, btw.)

codemanyak commented 8 months ago

It wasn't the comment symbol at all, but just the occurrence of a tab character in the string literal, that causes the failure. And the code passed the GOLDBuilder only because it automatically replaces the tab characters by blanks on loading the file 😠 . Hence, it's just the grammars that are to be blamed. Apparently the character set {all_printable} does not include tab as member.

Workaround: Replace tab characters in string literals by \t.

codemanyak commented 8 months ago

Grammars for Java SE8 and ANSI-C99 now allow tab characters in string literals. Ready for version 3.32-19.