Open shriram opened 5 years ago
In general this seems to be a unicode problem, but losing …
seems especially unfortunate…
This isn't a valid Pyret program: identifiers are defined to be "letters, digits, dollars, underscores or dashes (not starting with a digit)". Pyret only ever handles non-ASCII characters within string literals or comments; anywhere else causes an error. So we have several choices:
Options 1 and 2 require that browsers' JS engines have Unicode-aware regexps, or that we hack around their lack by explicitly rewriting our regexps to use hex-escapes instead of character classes. (We currently do this for our strings, but it makes the regexes basically unmaintainable, as in, we got the regex wrong several times and I am still not 100% confident in the current result. I am hesitant to do the same for identifiers, especially since identifiers are a much more common regexp to need to run...)
Anecdotally, apropos the ellipsis character: at least one plagiarism case was found because the student copy/pasted code from a blog, whose blogging platform had "helpfully" autoconverted three dots into an ellipsis.
I just genuinely don't see this as a high-priority issue: the only folks who are most likely to encounter it are teachers copy/pasting from Word documents, and we can help them by option 3.
Do we want to use …
as a synonym for ...
, in terms of template expressions? I'm tentatively thinking I can do that, but it's fiddly and I don't want to spend time on it unnecessarily.
I think auto-converting …
to ...
would address this issue (the way you handle curly-double-quotes) — that way if it does end up entered into the source, it disappears (and hints to the user, "don't use that, use this"). That was your suggestion #3 earlier. It looks like you're going back on it?
[Btw, this doesn't only impact Word users. I actually use … all the time because it's easy to type on a Mac keyboard. … … … … …]
It occurred to me just now that it would prevent you from even typing the ellipsis character in strings: we don't have a way to enter it, and if pasting auto-converted it...then it would always go away. I need to double-check whether the auto-conversion has a "Click here to undo" option, and if it does, then I'm fine with option #3.
The line inside
foo
indents to the left column. If I replace the…
(since ellipsis character) with...
(dot-dot-dot) the indenter recovers.