GeSHi / geshi-1.0

Original version of Generic Syntax Highlighter for PHP
http://qbnz.com/highlighter/
GNU General Public License v2.0
168 stars 101 forks source link

TCL syntax formatting issue #9

Closed ghost closed 10 years ago

ghost commented 10 years ago

When entering TCL text, the following syntax [formatting] error occurs:

set ::a 5 uplevel #0 { set b [expr $a + 5 ] } return $::b

The "#0" and beyond, on that line, is returned as a COMMENT (color / italics) by GeSHi, which is WRONG! It's part of the UPLEVEL syntax (absolute global level). UPVAR also uses it.

uplevel <integer or "#0"> { code } upvar <integer or "#0"> remote_variable local_variable

Normally, a "#" at the beginning of a line indicates a comment. The only times a TCL comment could begin are: (1) first character of a line [ignoring whitespace] (2) after a semi-colon (";"), which is a command delimiter in TCL (3) after an opening [curly] brace

An example of the all three is:

if { $loop == 5 } { # This handles when ever $LOOP is = 5      incr a ; # Skip 5, continue on to 6, and continue with the loop      # We want to skip when $LOOP = 5 }

A consequence is an unintentional comment being allowed, though, some syntax / format parsers mis-handle it:

if { $loop == 4 } { # Do nothing # }

The comment SHOULD swallow the close-brace (everything from "#" to the linefeed), and, a "brace count mismatch" should ultimately occur (some versions of TCL do not see the error and safely (but improperly) protect the brace); most parsers are inconsistent with handling that situation.

Sei-Lisa commented 10 years ago

Can you please try this patch?

diff --git a/src/geshi/tcl.php b/src/geshi/tcl.php
index 59dfb5d..025ca74 100644
--- a/src/geshi/tcl.php
+++ b/src/geshi/tcl.php
@@ -53,7 +53,7 @@ $language_data = array (
     'COMMENT_SINGLE' => array(1 => '#'),
     'COMMENT_MULTI' => array(),
     'COMMENT_REGEXP' => array(
-        1 => '/(?<!\\\\)#(?:\\\\\\\\|\\\\\\n|.)*$/m',
+        1 => '/(?:^|\{|;)\s*#(?:\\\\\\\\|\\\\\\n|.)*$/m',
         //2 => '/{[^}\n]+}/'
         ),
     'CASE_KEYWORDS' => GESHI_CAPS_NO_CHANGE,
ghost commented 10 years ago

CC: Dustin

Hi.

I'm afraid it doesn't work. The command UPLEVEL still gets truncated because 1/2 the line is still treated as a comment.

http://i.imgur.com/0pHim84.png

Also, there's another [related] issue that GeSHi may miss in TCL:

set number 1 switch -exact -- [string tolower $number] { 1 { puts "Bingo!" } 2 { # Do nothing / ignore this (but honor the close brace to the SWITCH condition / the comment can NOT swallow this close-brace) -> } default { uplevel #0 { puts "Invalid value caused by: [info level [info level]]" } } } puts "Done with SWITCH command."

The SWITCH pattern 2 (which is nothing more than a 1-2 list element pair) is a comment within the element, so the while INNER comment is just that: a comment.

The close-brace at the end closes the element and, therefore, can't be swalled by the comment. In all other circumstances (where a list element isn't the target of the TCL parser), the close-brace WOULD be swallowed-up as part of the comment.

SWITCH [flags | --] { { match { program body } } { match { program body } } { match { program body } } { match { program body } } ... }

As such, SWITCH can be written in one line as such:

These do the same thing (believe it or not, so long as the variable is a simple digit)

switch -exact -- $variable { 1 { puts "1!" } 2 { puts "2!" } 3 { puts "3!" } default { puts "Number is not 1-3: $variable" } }

switch -exact -- $variable { [list 1 "puts 1!"] [list 2 "puts 2!"] [list 3 "puts 3!"] [list default "puts \"Number is not 1-3: $variable\""] }

switch -exact -- $variable { [list 1 "puts 1!" 2 "puts 2!" 3 "puts 3!" default "puts \"Number is not 1-3: ${variable}\""] }

I'm sure that makes your sytnax-hunting much more difficult. I'm just demonstrating how it works so you can see it better.

Me and my staff have tried to adjust the regexp to fix this, as well, but we aren't having any success. The key are seems to be that "\s" part, which allows for zero-or-more matches against \s (that "zero" part seems to be a problem). We've changed it to "+" and even to "[ ][^ ]#" where any non-line/statement-terminator should break the "#" from being a comment. Nothing's working on our end.

~ Mai

----- Original Message ----- From: Sei-Lisa To: GeSHi/geshi-1.0 Cc: MaiDomino Sent: Sunday, June 08, 2014 17:23 Subject: Re: [geshi-1.0] TCL syntax formatting issue (#9)

Can you please try this patch?

diff --git a/src/geshi/tcl.php b/src/geshi/tcl.php index 59dfb5d..025ca74 100644 --- a/src/geshi/tcl.php +++ b/src/geshi/tcl.php @@ -53,7 +53,7 @@ $language_data = array ( 'COMMENT_SINGLE' => array(1 => '#'), 'COMMENT_MULTI' => array(), 'COMMENT_REGEXP' => array(

Sei-Lisa commented 10 years ago

My bad. I should have tested it but I just gave it a stab in the dark. This one is actually tested. I made also a mistake in that the { or ; should not be considered a part of a comment; this patch fixes that case.

diff --git a/src/geshi/tcl.php b/src/geshi/tcl.php
index 59dfb5d..1d43279 100644
--- a/src/geshi/tcl.php
+++ b/src/geshi/tcl.php
@@ -50,10 +50,10 @@

 $language_data = array (
     'LANG_NAME' => 'TCL',
-    'COMMENT_SINGLE' => array(1 => '#'),
+    'COMMENT_SINGLE' => array(),
     'COMMENT_MULTI' => array(),
     'COMMENT_REGEXP' => array(
-        1 => '/(?<!\\\\)#(?:\\\\\\\\|\\\\\\n|.)*$/m',
+        1 => '/(?:^|(?<=\{|;))\s*#(?:\\\\\\\\|\\\\\\n|.)*$/m',
         //2 => '/{[^}\n]+}/'
         ),
     'CASE_KEYWORDS' => GESHI_CAPS_NO_CHANGE,

Please test it and I will submit a PR.

As for the other case you're reporting, I doubt GeSHi 1.0 is able to cope with that; dealing with different behaviours depending on the context is something that only GeSHi 1.1+ can do. https://github.com/GeSHi/geshi-1.1/

Unfortunately, there is no TCL mode for GeSHi 1.1 yet. Do you feel like writing one?

dustin-lennon commented 10 years ago

I have tested out this patch for the OP and have verified that it does work. I cannot respond for her in regards to GeSHi 1.1

Sei-Lisa commented 10 years ago

Note that technically, with that patch, the spaces preceding the comment are part of the comment. In practice, this will probably only be noticeable if the comment is highlighted with a different background, for example. If that needs to be fixed, this may work:

1 => '/(?:^|\{|;)\s*(#(?:\\\\\\\\|\\\\\\n|.)*)$/m',

But I'm unsure about whether it can interfere with other highlighting, since it doesn't use lookbehind (it can't be used here because it can't match a variable number of spaces).

As for the other testcase, please bear with me, I have no idea of TCL at all. Based on your example, I have made three cases that differ only in where # **COMMENT HERE** # is located.

Is this valid?

set number 1
switch -exact -- [string tolower $number] {
 1 { puts "Bingo!" }
 2 { # Do nothing / ignore this (but honor the close brace to the SWITCH condition / the comment can NOT swallow this close-brace) -> }
 default { uplevel #0 { puts "Invalid value caused by: [info level [info level]]" # **COMMENT HERE** # } }
}
puts "Done with SWITCH command."

And this?

set number 1
switch -exact -- [string tolower $number] {
 1 { puts "Bingo!" }
 2 { # Do nothing / ignore this (but honor the close brace to the SWITCH condition / the comment can NOT swallow this close-brace) -> }
 default { uplevel #0 { puts "Invalid value caused by: [info level [info level]]" } # **COMMENT HERE** # }
}
puts "Done with SWITCH command."

And how would this behave?

set number 1
switch -exact -- [string tolower $number] {
 1 { puts "Bingo!" }
 2 { # Do nothing / ignore this (but honor the close brace to the SWITCH condition / the comment can NOT swallow this close-brace) -> }
 default { # **COMMENT HERE** # uplevel #0 { puts "Invalid value caused by: [info level [info level]]" } }
}
puts "Done with SWITCH command."

My understanding, according to the rules you've described, is that neither is valid. The third one comes close, but there's a } too much because the { that opens it is commented out. Is that correct?

I see that even GitHub's own Pygments syntax highlighter doesn't treat that correctly. At a minimum, it does swallow the closing brace pointed to by the comment in the 2 case.

Sei-Lisa commented 10 years ago

Sorry, never mind. I have tried to make the switch work based on very simple assumptions, and even then, the regexp for GeSHi 1.0 gets convoluted quickly. This needs GeSHi 1.1 and someone who understands the grammar well (hint: I found the GeSHi 1.1 JavaScript highlighter a good base to learn by example).

I'll check the code to see if the second regexp I'm proposing for fixing the OP can work well to omit the spaces as part of the comment.

Sei-Lisa commented 10 years ago

The last regex I proposed doesn't work at all. GeSHi doesn't use only the captured part, it uses the whole expression always. It's hopeless. It needs context switching in order to have a chance to work.

I've submitted a PR with the fix for the OP's case.

Sei-Lisa commented 10 years ago

Since the remaining issue (the switch case) can't be fixed in GeSHi 1.0 and is therefore a wontfix, this issue can now be closed. Feel free to open a feature request in GeSHi 1.1 to include TCL; someone might take on it, and it does have chances to include context-dependent behavior such as that of switch.