Open shigma opened 6 years ago
At first glance it looked like the escape \ fix was correct, but now I'm not so sure:
I'm not sure I see a difference in:
"box 1: \!\(x\^2\); box 2: \(y\^3\) "
(* ^ keyword.operator.string-box *)
(* ^^ keyword.operator.x-scriptBox *)
(* ^^^^^^^^ string.quoted *)
\( \)
(*^^ punctuation.section.box.begin.wolfram *)
(*^^^^^ meta.box.wolfram *)
(* ^^ punctuation.section.box.end.wolfram *)
Should I?
This breaks strings like "FOO`BAR`BAZ`"
where this represents a context, which is a big regression that must be fixed before this is merged
This breaks strings like
"FOO`BAR`BAZ`"
where this represents a context, which is a big regression that must be fixed before this is merged
Emmm, it does. How do you plan to solve it?
I'm not sure I see a difference.
What do you mean by "difference"?
At first glance it looked like the escape \ fix was correct, but now I'm not so sure:
It looks good to me.
I'm not sure I see a difference.
What do you mean by "difference"?
When I pasted these code snippets and applied your changes, the highlighting didn't change at all.
This breaks strings like
"FOO`BAR`BAZ`"
where this represents a context, which is a big regression that must be fixed before this is mergedEmmm, it does. How do you plan to solve it?
I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:
(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])
It seems to work but @batracos didn't like it. @Shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support
At first glance it looked like the escape \ fix was correct, but now I'm not so sure:
It looks good to me.
Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.
I'm not sure I see a difference.
What do you mean by "difference"?
When I pasted these code snippets and applied your changes, the highlighting didn't change at all.
It's wield. Syntax rules before this PR should not have been able to handle these syntaxes. Can you tell me which commit?
I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:
(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])
It seems to work but @batracos didn't like it. @Shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support
I'm not sure I get your point by (?<=[^a-zA-Z0-9$]|[\"\`])
but the thing is slots in a StringTemplate can be placed just between letters:
In[11]:= StringTemplate["FOO`BAR`BAZ`"][<|"BAR" -> "-"|>]
Out[11]= "FOO-BAZ`"
The syntax for python didn't work out a solution for string templates, either:
str = 'I just want a {plain} brace'
# ^^^^^^^ constant.other.placeholder.python
We don't show an attempt to format
it and it appears as if we do.
If you really want to distinguish those behaviour, maybe we can just match the function outside the string, for example, treat strings inside a Begin
function as "plain" to some extent. I think provide different functions with different inner context is a good solution to all the problems like this, and functions like Block
has applied this solution before.
I'm interested in it but suggest not to do such things in this PR but open a new one instead because this PR has already does too many things.
Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.
Mathematica is not that strict:
In[21]:= "\a" // InputForm
(* Syntax::stresc: Unknown string escape \a. *)
Out[21]//InputForm= "\\a"
In[31]:= "\a"
Out[31]= "\\a"
Should out program be that strict? I just don't know.
If yes, maybe we can preserve the current rule and append the following rule:
match: \\[\s\S]
scope: invalid.character.escape.wolfram
Clearly back slash on its own isn't supported from the second output, and when mathematica doesn't complain it's a bug. We should always highlight the character after \ in a string to indicate there is an escape happening, this is crucial to help visually indicate bugs.
Mathematica is not that strict:
In[21]:= "\a" // InputForm (* Syntax::stresc: Unknown string escape \a. *) Out[21]//InputForm= "\\a" In[31]:= "\a" Out[31]= "\\a"
Should out program be that strict? I just don't know.
If yes, maybe we can preserve the current rule and append the following rule:
match: \\[\s\S] scope: invalid.character.escape.wolfram
I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that \
followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but \
on its own should never be unformatted
I'm not sure, this is a much bigger issue than not supporting StringTemplate. I discussed this with @batracos at one point back in May actually and we couldn't find a good solution.. The closest thing I came up with was something along these lines for L140:
(?<=[^a-zA-Z0-9$]|[\"\`])\`\w*\`|\`\w*\`(?=[^a-zA-Z0-9$"])
It seems to work but @batracos didn't like it. @Shigma and @batracos can we improve this, use it, or come up with something better? If not, we should hold off on StringExpression backtick support
I'm not sure I get your point by
(?<=[^a-zA-Z0-9$]|[\"\`])
but the thing is slots in a StringTemplate can be placed just between letters:In[11]:= StringTemplate["FOO`BAR`BAZ`"][<|"BAR" -> "-"|>] Out[11]= "FOO-BAZ`"
The syntax for python didn't work out a solution for string templates, either:
str = 'I just want a {plain} brace' # ^^^^^^^ constant.other.placeholder.python
We don't show an attempt to
format
it and it appears as if we do.If you really want to distinguish those behaviour, maybe we can just match the function outside the string, for example, treat strings inside a
Begin
function as "plain" to some extent. I think provide different functions with different inner context is a good solution to all the problems like this, and functions likeBlock
has applied this solution before.I'm interested in it but suggest not to do such things in this PR but open a new one instead because this PR has already does too many things.
Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.
Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.
I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).
Is this OK?
I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that
\
followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but\
on its own should never be unformatted
I see. Hope the following rules are satisfactory: https://github.com/ViktorQvarfordt/Sublime-WolframLanguage/blob/4fa3e82dd4f677e88056e9f50f4093d74a00785c/WolframLanguage.sublime-syntax#L137-L161
I found only [a-zA-Z]
except [nrtbf]
can result in a error and [()"!^%&+_*@`/\\]
after a back-slant will be recognized as a special character. Under other circumstances, any character after a back-slant will be recognized as usual:
In[69]:= "\8" // Characters
Out[69]= {"\\", "8"}
Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.
I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).
Is this OK?
This is a great idea, I'll test it soon.
Contexts show up in many other strings. We absolutely cannot break this for StringTemplated strings. All strings can contain normal backticks. Only in the case for StringTemplate are the backticks special. What this means is assume and default to the backtick being a normal backtick in a string, unless we can properly distinguish. I was suggesting a hacky solution so that sometimes StringTemplate would be right but that contexts would always be right, but I'm less convinced by this. I 100% think we should not support StringTemplate backtick syntax, please remove it from this PR unless there is a much, much better solution.
I don't entirely agree with you but given that the colorization for string template is a break change I decide to remove all the related rules from general string recognition but preserve them for some functions (specifically StringTemplate and TemplateApply) and assignment for usages (which will automatically use string template when displayed in a message).
Is this OK?
I want to bring the idea of if there are spaces around the backticks back on the table. While it won't fix every other StringTemplate, it would fix most of the ones that I use. There would just be some instances (when the backticks where next to other characters but it was actually a StringTemplate and not inside the functions we whitelisted.. so very rare). I also suggest we add Success and Failure to the list of functions which commonly use StringTemplate and always turn on the syntax highlighting for this.
I think you missed my point. Mathematica IS that strict. If the message isn't appearing, it is a bug that it isn't appearing. That syntax is never actually valid, and we must indicate that
\
followed by any char means it's an attempted escape. I'm fine, and even encourage giving these a different different colors (valid vs invalid escapes) but\
on its own should never be unformattedI see. Hope the following rules are satisfactory:
Sublime-WolframLanguage/WolframLanguage.sublime-syntax
Lines 137 to 161 in 4fa3e82
escape characters
- match: \\[-"nrtbf()!^%&+_*@`/\\] scope: constant.character.escape.wolfram - match: |- (?x)( \\[0-7]{3}| \\\.[0-9A-Fa-f]{2}| \\:[0-9A-Fa-f]{4} ) scope: constant.character.encoding.wolfram - match: \\\[({{named_characters}})\] scope: constant.character.built-in.wolfram # invalid characters - match: |- (?x)( \\[0-7]{1,2}(?=[^0-7])| \\\.[0-9A-Fa-f]?(?=[^0-9A-Fa-f])| \\:[0-9A-Fa-f]{0,3}(?=[^0-9A-Fa-f]) ) scope: invalid.character.encoding.wolfram - match: \\\[\w+\] scope: invalid.character.built-in.wolfram - match: \\[a-zA-Z] scope: invalid.character.escape.wolfram
I found only
[a-zA-Z]
except[nrtbf]
can result in a error and[()"!^%&+_*@`/\\]
after a back-slant will be recognized as a special character. Under other circumstances, any character after a back-slant will be recognized as usual:In[69]:= "\8" // Characters Out[69]= {"\\", "8"}
At first glance this seems to work, but it does look like it can be cleaned up. Perhaps another thing to put an issue for, but I'll take another look next week
To be honest I never work with boxes so I'm just not sure if this is expected:
Current Release:
This PR's latest commit:
It looks a bit to me like some of the escape character rules are leaking here. I wasn't really sure why this was the list of escape characters, in particular ()%+_@/ could you give some insight on everything past f in the list perhaps?
[nrtbf()"!^%&+_*@`/\\]
This is the best resource I've found: https://reference.wolfram.com/language/tutorial/InputSyntax.html apparently \000 is a thing too..
I tried to respond to most questions tonight, I won't be back on over the weekend. I plan to leave your most recent commits in my local sublime for a few days at work next week before approving so I have time to see if I notice anything with my normal workflows.
To be honest I never work with boxes so I'm just not sure if this is expected:
Current Release:
This PR's latest commit:
Maybe I worked with boxes before and just forgot all about it later on ...
It looks a bit to me like some of the escape character rules are leaking here. I wasn't really sure why this was the list of escape characters, in particular ()%+_@/ could you give some insight on everything past f in the list perhaps?
[nrtbf()"!^%&+_*@`/\\]
This is the best resource I've found: https://reference.wolfram.com/language/tutorial/InputSyntax.html apparently \000 is a thing too..
See this:
In[153]:= Characters /@ {"\+", "\-", "\>"}
Out[153]= {{"\+"}, {"\\", "-"}, {}}
@chere005 The following code may better depict the escaping behavior:
Quiet @ Last @ Reap[
Scan[
Sow[#, Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
CharacterRange[33, 126]
],
_,
#1 -> StringJoin[#2] &
]
With the following result:
.01234567:ABCDEFGHIJKLMNOPQRSTUVWXYZ[acdeghijklmopqsuvwxyz
!"%&()*+/@\^_`bfnrt
#$',-89;=?]{|}~
<>
Any questions @batracos ?
Sorry for the long time off the project. I simply did not have the bandwidth to spare. Except for those three tests not passing this seems good to go.
Hold off on this until I can take another look..
Sent from my iPhone
On Mar 23, 2019, at 12:04 PM, batracos notifications@github.com wrote:
Sorry for the long time off the project. I simply did not have the bandwidth to spare. Except for those three tests not passing this seems good to go.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@chere005 The following code may better depict the escaping behavior:
Quiet @ Last @ Reap[ Scan[ Sow[#, Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &, CharacterRange[33, 126] ], _, #1 -> StringJoin[#2] & ]
With the following result:
* **errored:** `.01234567:ABCDEFGHIJKLMNOPQRSTUVWXYZ[acdeghijklmopqsuvwxyz` * **escaped:** `` !"%&()*+/@\^_`bfnrt `` * **non-escaped:** `#$',-89;=?]{|}~` * **disappeared:** `<>`
@batracos and I discussed this, in a string the only characters that should show up as valid escape characters is nrtbf
Things like "\*"
which don't complain end up with a string "\\*"
which means the *
was not actually escaped. Please make this change and fix the tests that @batracos mentioned are failing before I make another pass at reviewing this.
@Shigma any plans to update this with the changes we mentioned?
Support built-in characters and character encoding in strings.
Not treat
\
as escape anymore, but special characters like\n
will be recognized as escapes as usual.Support string templates.
Support string representation of boxes.