Indenter with ``` screws up content

shriram commented 8 years ago

If I want to write

words = string-split-all(```
word1
word2
word3
```, "\n")

the indenter turns this into

words = string-split-all(```
                         word1
                         word2
                         word3
                         ```, "\n")

This is really problematic, because the value becomes

> words
[list: "word1", "                         word2", "                         word3"]

which is not at all what I want! (Note the lack of leading space before "word1".)

shriram commented 8 years ago

[Please read the textual source: github is totally screwing up the presentation due to backquotes confused with Markdown.]

blerner commented 8 years ago

(I've edited your comment so that it contains spaced-out triple quotes instead. Now it renders better.)

I pretty much frustratedly hate everything about triple-quoted strings :( They don't lex well, they don't indent consistently, we haven't figured out a good story for ripping out the line-leading whitespace, they screw up both CM and emacs' readers (because three delimiter characters is one character too many for either of them to handle) and require hacks to fix, etc etc. They even screw up Github's pasting of code ;-)

Because of the leading whitespace, they don't work well for their original intended use, of lengthy doc-strings. Can we change their semantics so they rip out the newline-plus-whitespace-up-to-first-character on every line, so that it effectively de-indents by whatever the first line's worth is?

sorawee commented 8 years ago

The workaround in Github for writing triple quotes inside triple quote format is to use 4 characters indentation instead of triple quote format.

I recall that Python has the same problem for docstring. Here's their solution: https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation

I don't think we should adopt it, though. It's not obvious how the algorithm works to users.

shriram commented 8 years ago

I was going to say, we should study what Python does and just copy it…

blerner commented 7 years ago

Proposal: Docstrings lex and know the starting column number of the first backtick. They de-indent all lines by that amount. Anything that's outdented further than that leads to a well-formedness error.

jpolitz commented 7 years ago

I like this idea.

Vocab question – is "docstring" the word we should use to refer to triple-quoted strings? What about "string blocks" or "multi-line strings"?

On Tue, Aug 22, 2017 at 9:51 AM, Ben Lerner notifications@github.com wrote:

Proposal: Docstrings lex and know the starting column number of the first backtick. They de-indent all lines by that amount. Anything that's outdented further than that leads to a well-formedness error.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/brownplt/pyret-lang/issues/639#issuecomment-324086504, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHUU75zcmJJuZsuamTkTwD-nPgbTMWuks5sawcEgaJpZM4HVopo .

blerner commented 7 years ago

Philip and I have just been abbreviating then as TQS, but that doesn't trip lightly off the tongue... I said docstring above because of this particular use case, but I don't much care what we call them in general.

blerner commented 2 years ago

@jpolitz This issue has potentially come up again with @esSteres's LSP work: apparently, VSCode and others can take docstrings that are Markdown-formatted, and pretty-print them. It'd be nice for us to use this for doc: strings, but Markdown treats 4+ spaces of indentation as blockquotes. So we'd need to do this margin-stripping in order for the strings to be useful, and therefore we'd need to do this in the lexer (where we have the srcloc information for the string itself). Do we think that changing the spacing-semantics of TQS string literals now, five+ years after this issue was started, would be a massively inconvenient breaking change for our users?

shriram commented 2 years ago

I think it's fine to make a backward-breaking change.

Someday a full-blown #lang would make the change not even break anything; for now, use context can't help with that, but so be it.

blerner commented 2 years ago

I'm thinking a bit about this issue this morning, specifically where in the compiler pipeline we want to apply this cleanup. I don't really want to do it in the parser, since we get least-pleasant error messages from that phase, but we don't preserve the fact that triple-quoted strings were triple-quoted into the AST -- this line of code eliminates any lexical distinctions between how strings were written. I suppose we could throw a new parseErrorMalformedTQSStringexception akin to the parseErrorUnterminatedString exceptions we already throw...

esSteres commented 2 years ago

Could we just have the string nodes in the AST store information about whether they were triple-quoted, and if so, what their indentation level is? We talked previously about adding source locations to strings for LSP purposes; this would just be an extension of that (actually, this info should be able to be derived from the sourceloc). Not sure if it would fix the fact that use context wouldn't help, but it might?

blerner commented 2 years ago

All s-str nodes have a srcloc in them, but not all strings show up as s-str nodes in the AST. I'm not sure which change would be more invasive to the overall AST design.

On Fri, Nov 12, 2021, 12:28 PM esSteres @.***> wrote:

Could we just have the string nodes in the AST store information about whether they were triple-quoted, and if so, what their indentation level is? We talked previously about adding source locations to strings for LSP purposes; this would just be an extension of that (actually, this info should be able to be derived from the sourceloc). Not sure if it would fix the fact that use context wouldn't help, but it might?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/brownplt/pyret-lang/issues/639#issuecomment-967288625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHAHQDDAXLAWVJZLKNRAT3ULVFCJANCNFSM4B2WRJUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

brownplt / pyret-lang

Indenter with ``` screws up content #639