PowerShell / EditorSyntax

PowerShell syntax highlighting for editors (VS Code, Atom, SublimeText, TextMate, etc.) and GitHub!
MIT License
133 stars 45 forks source link

unescaped characters in the tmLanguage file? #139

Closed msftrncs closed 5 years ago

msftrncs commented 6 years ago

I've noticed when generating the tmLanguage file programmatically, that I end up with more characters escaped, namely " and ', than are escaped in the file in the repository. I couldn't seem to find any guidance on this.

repository

<key>match</key>
<string>\b((?:\'|\")?)(\w+)((?:\'|\")?)(?:\s+)?(=)(?:\s+)?</string>

programmatically generated

<key>match</key>
<string>\b((?:\&apos;|\&quot;)?)(\w+)((?:\&apos;|\&quot;)?)(?:\s+)?(=)(?:\s+)?</string>

Any guidance here?

omniomi commented 6 years ago

How was that generated? The plist format does not require characters like quotes be converted to SGML entities. Only the < and > symbols need to be converted to &lt; and &gt;. That said, I do not know if using entities elsewhere would actually break anything... What happens if you convert it back using the build script?

msftrncs commented 6 years ago

@omniomi , I wrote a function to convert the JSON tmLanguage file, and used [System.Security.SecurityElement]::Escape() which just escapes all 5 characters. However, according to StackOverflow and to w3.org that's not exactly what should be done, depending on were the contents are going, as there are some places they are not to be escaped. Even > doesn't need to be escaped in the character data. (Kinda like the rules for regex character class groups)