Add support for boxes and templates

shigma commented 6 years ago

Support built-in characters and character encoding in strings.

  "\[alpha]"
(* ^^^^^^^^ constant.character.built-in.wolfram *)

  "\:123456"
(* ^^^^^^ constant.character.encoding.wolfram *)

Not treat `\` as escape anymore, but special characters like `\n` will be recognized as escapes as usual.

  "This is\n\a string. (* not a comment *)"
(*^ punctualation.defination.string.begin *)
(*        ^^ constant.character.escape *)
(*          ^^ string.quoted *)

Support string templates.

  StringTemplate["Value `a`: <* Range[#n] *>."][<|"a" -> 1234, "n" -> 3|>]
(*                      ^^^ variable.parameter *)
(*                           ^^ keyword.operator.template-expression *)

Support string representation of boxes.

  "box 1: \!\(x\^2\); box 2: \(y\^3\) "
(*        ^ keyword.operator.string-box *)
(*             ^^ keyword.operator.x-scriptBox *)
(*                           ^^^^^^^^ string.quoted *)

  \( \)
(*^^ punctuation.section.box.begin.wolfram *)
(*^^^^^ meta.box.wolfram *)
(*   ^^ punctuation.section.box.end.wolfram *)

chere005 commented 6 years ago

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:

chere005 commented 6 years ago

I'm not sure I see a difference in:

  "box 1: \!\(x\^2\); box 2: \(y\^3\) "
(*        ^ keyword.operator.string-box *)
(*             ^^ keyword.operator.x-scriptBox *)
(*                           ^^^^^^^^ string.quoted *)

  \( \)
(*^^ punctuation.section.box.begin.wolfram *)
(*^^^^^ meta.box.wolfram *)
(*   ^^ punctuation.section.box.end.wolfram *)

Should I?

chere005 commented 6 years ago

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

shigma commented 6 years ago

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

Emmm, it does. How do you plan to solve it?

shigma commented 6 years ago

I'm not sure I see a difference.

What do you mean by "difference"?

shigma commented 6 years ago

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:

It looks good to me.

chere005 commented 6 years ago

I'm not sure I see a difference.

What do you mean by "difference"?

When I pasted these code snippets and applied your changes, the highlighting didn't change at all.

chere005 commented 6 years ago

This breaks strings like "FOO`BAR`BAZ`" where this represents a context, which is a big regression that must be fixed before this is merged

Emmm, it does. How do you plan to solve it?

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:

(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])

It seems to work but @batracos didn't like it. @Shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support

chere005 commented 6 years ago

At first glance it looked like the escape \ fix was correct, but now I'm not so sure:

It looks good to me.

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

shigma commented 6 years ago

I'm not sure I see a difference.

What do you mean by "difference"?

When I pasted these code snippets and applied your changes, the highlighting didn't change at all.

It's wield. Syntax rules before this PR should not have been able to handle these syntaxes. Can you tell me which commit?

shigma commented 6 years ago

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:
(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])
It seems to work but @batracos didn't like it. @Shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support

I'm not sure I get your point by (?<=[^a-zA-Z0-9$]|[\"\`]) but the thing is slots in a StringTemplate can be placed just between letters:

In[11]:= StringTemplate["FOO`BAR`BAZ`"][<|"BAR" -> "-"|>]
Out[11]= "FOO-BAZ`"

The syntax for python didn't work out a solution for string templates, either:

str = 'I just want a {plain} brace'
                   # ^^^^^^^ constant.other.placeholder.python

We don't show an attempt to format it and it appears as if we do.

If you really want to distinguish those behaviour, maybe we can just match the function outside the string, for example, treat strings inside a Begin function as "plain" to some extent. I think provide different functions with different inner context is a good solution to all the problems like this, and functions like Block has applied this solution before.

I'm interested in it but suggest not to do such things in this PR but open a new one instead because this PR has already does too many things.

shigma commented 6 years ago

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

Mathematica is not that strict:

In[21]:= "\a" // InputForm
(* Syntax::stresc: Unknown string escape \a. *)
Out[21]//InputForm= "\\a"

In[31]:= "\a"
Out[31]= "\\a"

Should out program be that strict? I just don't know.

If yes, maybe we can preserve the current rule and append the following rule:

match: \\[\s\S]
scope: invalid.character.escape.wolfram

chere005 commented 6 years ago

Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.

Mathematica is not that strict:
In[21]:= "\a" // InputForm
(* Syntax::stresc: Unknown string escape \a. *)
Out[21]//InputForm= "\\a"

In[31]:= "\a"
Out[31]= "\\a"
Should out program be that strict? I just don't know.

If yes, maybe we can preserve the current rule and append the following rule:
match: \\[\s\S]
scope: invalid.character.escape.wolfram

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

chere005 commented 6 years ago

I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:
(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])
It seems to work but @batracos didn't like it. @Shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support
I'm not sure I get your point by (?<=[^a-zA-Z0-9$]|[\"\`]) but the thing is slots in a StringTemplate can be placed just between letters:
In[11]:= StringTemplate["FOO`BAR`BAZ`"][<|"BAR" -> "-"|>]
Out[11]= "FOO-BAZ`"
The syntax for python didn't work out a solution for string templates, either:
str = 'I just want a {plain} brace'
                   # ^^^^^^^ constant.other.placeholder.python
We don't show an attempt to format it and it appears as if we do.

If you really want to distinguish those behaviour, maybe we can just match the function outside the string, for example, treat strings inside a Begin function as "plain" to some extent. I think provide different functions with different inner context is a good solution to all the problems like this, and functions like Block has applied this solution before.

I'm interested in it but suggest not to do such things in this PR but open a new one instead because this PR has already does too many things.

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

shigma commented 6 years ago

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

shigma commented 6 years ago

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

I see. Hope the following rules are satisfactory: https://github.com/ViktorQvarfordt/Sublime-WolframLanguage/blob/4fa3e82dd4f677e88056e9f50f4093d74a00785c/WolframLanguage.sublime-syntax#L137-L161

I found only [a-zA-Z] except [nrtbf] can result in a error and [()"!^%&+_*@`/\\] after a back-slant will be recognized as a special character. Under other circumstances, any character after a back-slant will be recognized as usual:

In[69]:= "\8" // Characters
Out[69]= {"\\", "8"}

chere005 commented 6 years ago

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

This is a great idea, I'll test it soon.

chere005 commented 6 years ago

Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.

I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).

Is this OK?

I want to bring the idea of if there are spaces around the backticks back on the table. While it won't fix every other StringTemplate, it would fix most of the ones that I use. There would just be some instances (when the backticks where next to other characters but it was actually a StringTemplate and not inside the functions we whitelisted.. so very rare). I also suggest we add Success and Failure to the list of functions which commonly use StringTemplate and always turn on the syntax highlighting for this.

chere005 commented 6 years ago

I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \ followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \ on its own should never be unformatted

I see. Hope the following rules are satisfactory:

Sublime-WolframLanguage/WolframLanguage.sublime-syntax

Lines 137 to 161 in 4fa3e82

escape characters
 - match: \\[-"nrtbf()!^%&+_*@`/\\] 
   scope: constant.character.escape.wolfram 
 - match: |- 
     (?x)( 
       \\[0-7]{3}| 
       \\\.[0-9A-Fa-f]{2}| 
       \\:[0-9A-Fa-f]{4} 
     ) 
   scope: constant.character.encoding.wolfram 
 - match: \\\[({{named_characters}})\] 
   scope: constant.character.built-in.wolfram 

 # invalid characters 
 - match: |- 
     (?x)( 
       \\[0-7]{1,2}(?=[^0-7])| 
       \\\.[0-9A-Fa-f]?(?=[^0-9A-Fa-f])| 
       \\:[0-9A-Fa-f]{0,3}(?=[^0-9A-Fa-f]) 
     ) 
   scope: invalid.character.encoding.wolfram 
 - match: \\\[\w+\] 
   scope: invalid.character.built-in.wolfram 
 - match: \\[a-zA-Z] 
   scope: invalid.character.escape.wolfram 
I found only [a-zA-Z] except [nrtbf] can result in a error and [()"!^%&+_*@`/\\] after a back-slant will be recognized as a special character. Under other circumstances, any character after a back-slant will be recognized as usual:
In[69]:= "\8" // Characters
Out[69]= {"\\", "8"}

At first glance this seems to work, but it does look like it can be cleaned up. Perhaps another thing to put an issue for, but I'll take another look next week

chere005 commented 6 years ago

To be honest I never work with boxes so I'm just not sure if this is expected:

Current Release:

This PR's latest commit:

chere005 commented 6 years ago

It looks a bit to me like some of the escape character rules are leaking here. I wasn't really sure why this was the list of escape characters, in particular ()%+_@/ could you give some insight on everything past f in the list perhaps?

[nrtbf()"!^%&+_*@`/\\]

This is the best resource I've found: https://reference.wolfram.com/language/tutorial/InputSyntax.html apparently \000 is a thing too..

chere005 commented 6 years ago

I tried to respond to most questions tonight, I won't be back on over the weekend. I plan to leave your most recent commits in my local sublime for a few days at work next week before approving so I have time to see if I notice anything with my normal workflows.

shigma commented 6 years ago

To be honest I never work with boxes so I'm just not sure if this is expected:

Current Release:

This PR's latest commit:

Maybe I worked with boxes before and just forgot all about it later on ...

shigma commented 6 years ago

It looks a bit to me like some of the escape character rules are leaking here. I wasn't really sure why this was the list of escape characters, in particular ()%+_@/ could you give some insight on everything past f in the list perhaps?

[nrtbf()"!^%&+_*@`/\\]

This is the best resource I've found: https://reference.wolfram.com/language/tutorial/InputSyntax.html apparently \000 is a thing too..

See this:

In[153]:= Characters /@ {"\+", "\-", "\>"}
Out[153]= {{"\+"}, {"\\", "-"}, {}}

shigma commented 6 years ago

@chere005 The following code may better depict the escaping behavior:

Quiet @ Last @ Reap[
    Scan[
        Sow[#, Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
        CharacterRange[33, 126]
    ],
    _,
    #1 -> StringJoin[#2] &
]

With the following result:

errored: .01234567:ABCDEFGHIJKLMNOPQRSTUVWXYZ[acdeghijklmopqsuvwxyz
escaped: !"%&()*+/@\^_`bfnrt
non-escaped: #$',-89;=?]{|}~
disappeared: <>

shigma commented 6 years ago

Any questions @batracos ?

batracos commented 5 years ago

Sorry for the long time off the project. I simply did not have the bandwidth to spare. Except for those three tests not passing this seems good to go.

chere005 commented 5 years ago

Hold off on this until I can take another look..

Sent from my iPhone

On Mar 23, 2019, at 12:04 PM, batracos notifications@github.com wrote:

Sorry for the long time off the project. I simply did not have the bandwidth to spare. Except for those three tests not passing this seems good to go.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

chere005 commented 5 years ago

@chere005 The following code may better depict the escaping behavior:

Quiet @ Last @ Reap[
    Scan[
        Sow[#, Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
        CharacterRange[33, 126]
    ],
    _,
    #1 -> StringJoin[#2] &
]

With the following result:

* **errored:** `.01234567:ABCDEFGHIJKLMNOPQRSTUVWXYZ[acdeghijklmopqsuvwxyz`

* **escaped:** `` !"%&()*+/@\^_`bfnrt ``

* **non-escaped:** `#$',-89;=?]{|}~`

* **disappeared:** `<>`

@batracos and I discussed this, in a string the only characters that should show up as valid escape characters is nrtbf Things like "\*" which don't complain end up with a string "\\*" which means the * was not actually escaped. Please make this change and fix the tests that @batracos mentioned are failing before I make another pass at reviewing this.

chere005 commented 5 years ago

@Shigma any plans to update this with the changes we mentioned?

ViktorQvarfordt / Sublime-WolframLanguage