Open shriram opened 8 years ago
[Please read the textual source: github is totally screwing up the presentation due to backquotes confused with Markdown.]
(I've edited your comment so that it contains spaced-out triple quotes instead. Now it renders better.)
I pretty much frustratedly hate everything about triple-quoted strings :( They don't lex well, they don't indent consistently, we haven't figured out a good story for ripping out the line-leading whitespace, they screw up both CM and emacs' readers (because three delimiter characters is one character too many for either of them to handle) and require hacks to fix, etc etc. They even screw up Github's pasting of code ;-)
Because of the leading whitespace, they don't work well for their original intended use, of lengthy doc-strings. Can we change their semantics so they rip out the newline-plus-whitespace-up-to-first-character on every line, so that it effectively de-indents by whatever the first line's worth is?
The workaround in Github for writing triple quotes inside triple quote format is to use 4 characters indentation instead of triple quote format.
I recall that Python has the same problem for docstring. Here's their solution: https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation
I don't think we should adopt it, though. It's not obvious how the algorithm works to users.
I was going to say, we should study what Python does and just copy it…
Proposal: Docstrings lex and know the starting column number of the first backtick. They de-indent all lines by that amount. Anything that's outdented further than that leads to a well-formedness error.
I like this idea.
Vocab question – is "docstring" the word we should use to refer to triple-quoted strings? What about "string blocks" or "multi-line strings"?
On Tue, Aug 22, 2017 at 9:51 AM, Ben Lerner notifications@github.com wrote:
Proposal: Docstrings lex and know the starting column number of the first backtick. They de-indent all lines by that amount. Anything that's outdented further than that leads to a well-formedness error.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/brownplt/pyret-lang/issues/639#issuecomment-324086504, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHUU75zcmJJuZsuamTkTwD-nPgbTMWuks5sawcEgaJpZM4HVopo .
Philip and I have just been abbreviating then as TQS, but that doesn't trip lightly off the tongue... I said docstring above because of this particular use case, but I don't much care what we call them in general.
@jpolitz This issue has potentially come up again with @esSteres's LSP work: apparently, VSCode and others can take docstrings that are Markdown-formatted, and pretty-print them. It'd be nice for us to use this for doc:
strings, but Markdown treats 4+ spaces of indentation as blockquotes. So we'd need to do this margin-stripping in order for the strings to be useful, and therefore we'd need to do this in the lexer (where we have the srcloc information for the string itself). Do we think that changing the spacing-semantics of TQS string literals now, five+ years after this issue was started, would be a massively inconvenient breaking change for our users?
I think it's fine to make a backward-breaking change.
Someday a full-blown #lang would make the change not even break anything;
for now, use context
can't help with that, but so be it.
I'm thinking a bit about this issue this morning, specifically where in the compiler pipeline we want to apply this cleanup. I don't really want to do it in the parser, since we get least-pleasant error messages from that phase, but we don't preserve the fact that triple-quoted strings were triple-quoted into the AST -- this line of code eliminates any lexical distinctions between how strings were written. I suppose we could throw a new parseErrorMalformedTQSString
exception akin to the parseErrorUnterminatedString
exceptions we already throw...
Could we just have the string nodes in the AST store information about whether they were triple-quoted, and if so, what their indentation level is? We talked previously about adding source locations to strings for LSP purposes; this would just be an extension of that (actually, this info should be able to be derived from the sourceloc). Not sure if it would fix the fact that use context
wouldn't help, but it might?
All s-str
nodes have a srcloc in them, but not all strings show up as
s-str
nodes in the AST. I'm not sure which change would be more invasive
to the overall AST design.
On Fri, Nov 12, 2021, 12:28 PM esSteres @.***> wrote:
Could we just have the string nodes in the AST store information about whether they were triple-quoted, and if so, what their indentation level is? We talked previously about adding source locations to strings for LSP purposes; this would just be an extension of that (actually, this info should be able to be derived from the sourceloc). Not sure if it would fix the fact that use context wouldn't help, but it might?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/brownplt/pyret-lang/issues/639#issuecomment-967288625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHAHQDDAXLAWVJZLKNRAT3ULVFCJANCNFSM4B2WRJUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
If I want to write
the indenter turns this into
This is really problematic, because the value becomes
which is not at all what I want! (Note the lack of leading space before "word1".)