Open aljazerzen opened 11 months ago
Another pro: all rows of a multi-line string line up - both the first and all following lines - in an obvious way. See the first example, and compare to the final...
Another con: I might like this so much that it will make me sad to type multi-line strings in other languages :-)
I agree with the pros & cons.
I do think it looks a bit alien, and in a different way than the zig version. Possibly that's because in most languages this is a common invalid expression — i.e. an unfinished string.
(only an aesthetic concern, so let's not weigh highly)
Given it's a very new construction and doesn't block anything, this is something I think is worth leaving open for a bit, for us to contemplate and aggregate views. (But we also shouldn't just leave it forever, let's make a decision in a couple of weeks?)
As mentioned in #3679, I'm in favour of d-strings rather, probably without the d prefix so that it can apply to s- and f-strings too.
To wit:
This is rough idea about d-string:
- d-string starts with triple quote (""" or ''') followed by newline. The newline is not included in the string.
- d"spam" and d'''egg''' are syntax error.
- d-string ends with indent and triple quote.
- Only the indent to be removed is allowed before closing triple quote in the line.
- Indents in lines that are same to the last line is stripped.
- d-string can be used with ‘f/F’ and `r/R’ prefix.
As for the delimiter, you could use d"""\n...\n"""
or """\n...\n"""
or even just "\n...\n"
. I would prefer some sort of triple-quote because that is the convention but it would not be necessary. The neat thing about this is that you can change the level of indentation of the whole block by just changing the amount of indentation of the closing delimiter.
I don't understand what would be the benefit of having to start each line with "
? I guess it probably comes down to
- lexing of one line is (mostly) independent of lexing of the other lines. This simplifies the lexer.
but weighing that up against
- pasting a large number of lines would require you to prefix each one with a "
, the lexer is written once by a handful of people whereas pasting a large number of lines will probably happen multiple times a day (/week/month) by thousands of people. I know I'm not one of the lexer writers so it's a bit unfair for me to say but I do feel the leading quotes would be a big usability impairment. Maybe your text editor can automate it but then what about the Playground, or once we fix that then DBeaver etc... ?
let a = "
"this is the first line
" second line, indented with two spaces
" third line, also indented with two spaces
I find this very hard to parse by human eye and to me it is the opposite of significant whitespace.
I would much rather see
let a = "
this is the first line
second line, indented with two spaces
third line, also indented with two spaces
"
or better
let a = """
this is the first line
second line, indented with two spaces
third line, also indented with two spaces
"""
equivalent to (everything lines up above the closing delimiter)
let a = """
this is the first line
second line, indented with two spaces
third line, also indented with two spaces
"""
Also, if you wanted to quote dialog from a play or say a chatbot interaction?
let quotes = """
"To PRQL, or not to PRQL: that is the query."
"If PRQL be the language of data, query on."
"Friends, SQL users, data enthusiasts, lend me your queries."
"""
Pretty niche use case but with the increasing use of LLMs, perhaps the amount of leading quotes will not be insignificant?
Let's discuss d-strings in a separate thread, with a full description of how would they work in PRQL.
the lexer is written once by a handful of people whereas pasting a large number of lines will probably happen multiple times a day (/week/month) by thousands of people. I know I'm not one of the lexer writers so it's a bit unfair for me to say but I do feel the leading quotes would be a big usability impairment. Maybe your text editor can automate it but then what about the Playground, or once we fix that then DBeaver etc... ?
It's not about code complexity, but about the quality of compiler/language server error recovery. In short, think about what happens if when you have an unenclosed quote: all characters before the end of file or the next quote becomes the contents of the string. This will prevent the compiler from reporting any other error messages, language server to discard any precompiled information about the file and in a rare case report incorrect error location:
from x
derive a = "hello
derive b = "world"
\___ unclosed string at line 3
For more info, read On Modularity of Lexical Analysis, which has been linked to a bit too much.
whereas pasting a large number of lines will probably happen multiple times a day (/week/month) by thousands of people
This is a valid argument.
I'd say that this:
let a = "
"this is the first line
" second line, indented with two spaces
" third line, also indented with two spaces
... look equally nice as this:
let a = """
this is the first line
second line, indented with two spaces
third line, also indented with two spaces
"""
Under my proposal above, this is valid:
let quotes = "
""To PRQL, or not to PRQL: that is the query."
""If PRQL be the language of data, query on."
""Friends, SQL users, data enthusiasts, lend me your queries."
I had one thought... I may be mistaken, but it seems like the most common use of multi-line strings in PRQL today would be from_text
(at least, that's where I see them most used in the examples). Imagining for a moment that users wanted to use from_text
in any sort of scripted capacity (again, at least, that's how I plan to use it in the near term), the suggestion to have "
as a per-line prefix would make this use case quite painful. As an example, suppose the user was building a PRQL script that was templated with Jinja; currently this could be very straightforward:
from_text """
{{ source_csv_data }}
"""
derive {
d = b + c,
answer = 20 * 2 + 2,
}
You could also imagine a shell pipeline equivalent:
(
echo 'from_text """'
cat $INPUT_CSV
echo '"""'
echo 'derive { d = b + c }'
) | prqlc compile > script.sql
Yes, of course it's possible to do the necessary "quote prefix" operation in both of these cases; my only thought is I would expect that most users would find needing to do so frustrating. And while you could also emphasize that loading data using from_text
is not preferred, it's precisely those cases where from_text
is most useful (small translation tables that don't already exist in the database but are in a little CSV file) that would be most affected by this proposal.
Good point.
I want to say that ideally, people would not be using a templating engine over PRQL, as that means that PRQL lacks constructs for what you are trying to do with the template. It also makes you prone to injection attacks, but nevertheless, people will want to do templates, and such string literals would be quite an inconvenient blocker for that.
Do we need this?
I think we should avoid introducing things that people are not familiar with and syntax that looks foreign to users.
If we introduce something like this, I think it would be preferable go with the d-string because it looks just like an ordinary string prefixed with a d
.
The original motivation was to remove other multi-line-strings, as they cause problems for tooling development and performance in the future.
Yeah, maybe we should remove the multi-line strings because if I understand it correctly, they're a bit unorthodox as they have the syntax of normal strings but can span multiple lines and in addition to that we also have the triple double-quoted strings which I think do the same thing.
Maybe we should just keep the triple double-quoted string as is, and change the double-quoted strings to not span multiple lines.
This is followup of https://github.com/PRQL/prql/pull/3679#issuecomment-1763491707
I propose we add "block string literals" which are:
"
followed by a new line,"
character (also discarded),"
character, preceded by white-spaceExample:
... equivalent to:
... equivalent to:
Pros:
Cons:
"
Inspired by zig: https://ziglang.org/documentation/master/#String-Literals-and-Unicode-Code-Point-Literals.
The syntax that I propose differs in
"
instead of//
and"
on the first line, which is needed because our newlines have semantic meaning.Possible extensions: