brownplt / pyret-lang

The Pyret language.
Other
1.07k stars 110 forks source link

ellipsis character causes indenter to misbehave #1444

Open shriram opened 5 years ago

shriram commented 5 years ago
fun size-d(d):
  size-cof(d.cof) … size-cod(d.cod)
end

fun foo():

end

The line inside foo indents to the left column. If I replace the (since ellipsis character) with ... (dot-dot-dot) the indenter recovers.

shriram commented 5 years ago

In general this seems to be a unicode problem, but losing seems especially unfortunate…

blerner commented 5 years ago

This isn't a valid Pyret program: identifiers are defined to be "letters, digits, dollars, underscores or dashes (not starting with a digit)". Pyret only ever handles non-ASCII characters within string literals or comments; anywhere else causes an error. So we have several choices:

  1. Relax that restriction, and subject ourselves to the madness that is Unicode-awareness everywhere.
  2. Accept the ellipsis character as a special case.
  3. Automagically convert the ellipsis character into three periods, much as we convert smart-quotes

Options 1 and 2 require that browsers' JS engines have Unicode-aware regexps, or that we hack around their lack by explicitly rewriting our regexps to use hex-escapes instead of character classes. (We currently do this for our strings, but it makes the regexes basically unmaintainable, as in, we got the regex wrong several times and I am still not 100% confident in the current result. I am hesitant to do the same for identifiers, especially since identifiers are a much more common regexp to need to run...)

Anecdotally, apropos the ellipsis character: at least one plagiarism case was found because the student copy/pasted code from a blog, whose blogging platform had "helpfully" autoconverted three dots into an ellipsis.

I just genuinely don't see this as a high-priority issue: the only folks who are most likely to encounter it are teachers copy/pasting from Word documents, and we can help them by option 3.

blerner commented 5 years ago

Do we want to use as a synonym for ..., in terms of template expressions? I'm tentatively thinking I can do that, but it's fiddly and I don't want to spend time on it unnecessarily.

shriram commented 5 years ago

I think auto-converting to ... would address this issue (the way you handle curly-double-quotes) — that way if it does end up entered into the source, it disappears (and hints to the user, "don't use that, use this"). That was your suggestion #3 earlier. It looks like you're going back on it?

[Btw, this doesn't only impact Word users. I actually use … all the time because it's easy to type on a Mac keyboard. … … … … …]

blerner commented 5 years ago

It occurred to me just now that it would prevent you from even typing the ellipsis character in strings: we don't have a way to enter it, and if pasting auto-converted it...then it would always go away. I need to double-check whether the auto-conversion has a "Click here to undo" option, and if it does, then I'm fine with option #3.