JalmarazMartn / customDiagnostics

1 stars 0 forks source link

Multi-Line RegEx not working as in VS code? #3

Open MarvinWagner-BSS opened 5 days ago

MarvinWagner-BSS commented 5 days ago

My rule (to convert function definitions with multi-line parameter lists into a single line) =>

{
    "name": "One-Line Function Definitions",
    "searchExpresion": "procedure (.+)[(]([^)\\n]+)\\n",
    "replaceExpression": "procedure $1($2"
}

The unescaped search expression of that rule is: (for testing in VS code)

procedure (.+)[(]([^)\n]+)\n

It should match function definitions like these:

procedure DirtyExample(param1: Integer;
    param2: Integer;
        param3: Code[20]) : Boolean
begin
end;

procedure DirtyExample2(param1: Integer;
    param2: Integer;
        param3: Code[20]) : Boolean
var
    Test: Boolean;
begin
end;

The replacement may need multiple passes, but at the end after standard-formatting, the result should be:

procedure DirtyExample(param1: Integer; param2: Integer; param3: Code[20]) : Boolean
begin
end;

procedure DirtyExample2(param1: Integer; param2: Integer; param3: Code[20]) : Boolean
var
    Test: Boolean;
begin
end;

This search&replace pattern works when using the standard RegEx search inside VS code, and on regex101 but it does not work as a JAM bulk replacement rule. It seems to change nothing. (but it does register a match in the log) One-Line Function Definitions applied in file:///c%3A/.../_TEST.al

Any idea why and can you fix it?

~edit / solution~ I found a solution by matching for \r\n instead of just \n. Unescaped pattern => procedure (.+)[(]([^)\r\n]+)\r\n This does not work in VS code or on regex101, but works in a bulk replacement rule.

The newlines should actually be windows newlines as I'm on windows (though I didn't check with a hexeditor), but VS code and regex101 seem to be a bit more lenient with that.

I'd still vote for aligning the logic with VS code and regex101 defaults. And while you are at it, a multi-pass setting in the rules would be nice to have. ;)

JalmarazMartn commented 5 days ago

I will check it, and tell you. Thank you for the report.

JalmarazMartn commented 4 days ago

Thank you very much!!!

I remembered that I did once, and I have read your solution with \r\n. 

As you said the final rule is:

    {
        "name": "Multiline procedure to single line",
        "searchExpresion": "procedure (.+)[(]([^)\\r\\n]+)\\r\\n",
        "replaceExpression": "procedure $1($2"
    }

I already had it in file: https://github.com/JalmarazMartn/customDiagnostics/blob/main/FileExamples/Reemplazos/OtherReplacements.json

    {
        "name": "fields setting from field definition",
        "searchExpresion": "field\\(\\d+;\\s*(.+)\\s*;.+\\)",
        "replaceExpression": "ToTable.$1 := FromTable.$1;"
    },
    {
        "name": "fields setting from field definition remove other",
        "searchExpresion": "{([^{]|\\r\\rE\\n)+}(\\s|\\n)*",
        "replaceExpression": ""
    },

I don't understand your suggestion: what is your suggested improvement?

MarvinWagner-BSS commented 4 days ago

As you said the final rule is:

    {
        "name": "Multiline procedure to single line",
        "searchExpresion": "procedure (.+)[(]([^)\\r\\n]+)\\r\\n",
        "replaceExpression": "procedure $1($2"
    }

Yes, thats the rule that works for me.

I already had it in file: https://github.com/JalmarazMartn/customDiagnostics/blob/main/FileExamples/Reemplazos/OtherReplacements.json

    {
        "name": "fields setting from field definition",
        "searchExpresion": "field\\(\\d+;\\s*(.+)\\s*;.+\\)",
        "replaceExpression": "ToTable.$1 := FromTable.$1;"
    },
    {
        "name": "fields setting from field definition remove other",
        "searchExpresion": "{([^{]|\\r\\rE\\n)+}(\\s|\\n)*",
        "replaceExpression": ""
    },

I don't really see the similarity there. Except those do use \r and \n somewhere 🤔 (but I haven't tested those so far)

I don't understand your suggestion: what is your suggested improvement?

1

I'm not really sure how to improve the main issue easily. Matching for \n: LF Matching for \r\n: CRLF (and the same applies to VS code regex search - it only sees \n where the bulk rule sees \r\n)

It doesn't seem possible to know if a rule is going to work before testing it with the actual extension.

I couldn't make a difference to the result no matter which regex options on regex101. It might just be a difference in the regex library thats used. (theres also differences between just VS code and regex101, e.g. with \X and some others not being supported in VS code)

Maybe regex test window using the builtin regex library would help there. (to test the matching first without any replacements) But it seems like a lot to ask for. The extension is already very useful as it is 🤣

A hacky option could be to simply replace "\r\n" newlines by just "\n" first in the whole document. Then matching everything with just "\n" should work the same as in VS code or on regex101. And afterwards restore newlines to their original state? (if anything was actually modified from "\r\n" to "\n" in the first step?)

2

My other suggestion was to add a new property to the rules. Like after "replaceExpression". E.g. a "repeatCount". 0 = Repeat (recurse) forever until no more matches are found in the resulting output? (you might still want to implement a configurable limit to prevent accidental endless loops) 1 = Apply rule once, like it currently is. This should be the default if the new property isn't defined. N = Repeat N times, maybe for cases that do need multiple passes but where we do not carefully design our expression to stop matching at some point. If everything is good after 10x repeats, thats an option.

Otherwise we may currently have to duplicate a rule 10x times and insert it in the ruleset under 10x different names. Or would it also work to put in just the same rule multiple times in the same ruleset? I haven't tried that so far, but it would also be an acceptable workaround that doesn't require having actual duplicated rules.

3

One more thing that may be nice, is access to the regex options. The regex101 mode /s(ingle line) would sometimes also be useful in multi line content matching. (but other times it may hurt the results, so for sure keep it optional)

JalmarazMartn commented 3 days ago

I love this!!! You give me a lot of stuff to think about!! The issue will remain open and I will be trying all these points. Thank you, Marvin Wagner.

JalmarazMartn commented 3 days ago

Good morning: you have a new setting in replace rules for numberOfRepetitions (you can see at README), to answer your second suggestion. But the one more attractive is the first one: build the rule tester. Is an idea that time ago round my head.

MarvinWagner-BSS commented 3 days ago

Thanks. I've tested the numberOfRepetitions. Works perfectly👍

The simplest form of rule tester I could imagine would be a rule selection command from the Ctrl+Shift+P menu that simply prints all relevant match and group information (positions, indexes/names, contents) to the console. But some GUI with near-real-time response to regex changes could really speed up testing/development. Take your time, its not urgent 😉