Closed HaloFour closed 2 years ago
:+1: I'd love if C# supported heredocs
How's about something like
const string q = @" my
multine text that might have "
quotes and other things. < !>
"@;
So "string"
is a normal string
. @"string"
is a verbatim string and @"string"@
is a heredoc. This is pretty much how powershell does it.
@mburbea I don't think that specific syntax would work, it's ambiguous. Consider:
string s = @"foo"; // bar"@;
Is that a verbatim string containing foo
followed by a comment or a heredoc containing foo"; // bar
?
I like this proposal. I think this is a better alternative to the verbatim strings I have previously proposed.
@HaloFour You should make (and link to) a UserVoice suggestion so people can vote on it (or adopt an existing one if its available)
@mburbea Conceptually it's the same thing but I'm basing my syntax loosely on C++11s implementation, although I'm moving the double-quotes as it looks so bizarre having parts of the delimiter appear within them.
The problem with adopting PowerShell's syntax would probably be that the starting syntax is ambiguous with existing verbatim strings and that the compiler would have to scan until it hits either "@
or the end of the file in order to determine what kind of literal it is as it couldn't treat any individual double-quotes as terminators. @svick 's example further demonstrates how the syntax could be ambiguous. I think that a new starting syntax is required in order to ensure that the compiler knows what it will be looking for as a terminator.
@MgSam Thanks. I did find an existing UserVoice suggestion which uses the C++11 syntax and I linked to that.
I'd like PowerShell-like here-strings rather than C#'s verbatim strings too.
@ufcpp, at this point, I don't think "rather" is an option.
That's why, while I like the powershellish way, the initial proposal would be the best one, as it is unambiguous.
Yeah. I'm sure there are lots of things that the language designers would love to reconsider about C#, but hindsight is 20/20. I don't imagine that verbatim strings are anywhere close to the chopping block.
The one thing I really do like about C++'s raw strings is that custom delimiter syntax so there really is no such thing as one sentinel ending sequence which would require some kind of escaping. With PowerShell here-strings you're still stuck if for whatever reason the string would have to contain a new-line followed by "@
.
@gafter With a proposed syntax and some examples does this issue need anything further to be considered a proposal?
@HaloFour, how about a mix from the suggestion in the referenced issue above - doesn't `@`` solve most of the ambiguous touched upon above?
I see backticks as cleaner separation for delimiters than parenthesis
since it complicates parsing, and also feels keyword-ish.
@
"default raw string "(@-backtick) `@`delim"delimited "raw" string "delim
(@-backtick)
@"Verbatim string!"
"Normal string"
$"{Interpolated} string"
Edit: Or how about, always start on a new line and always end with a newline with @
, or simply python style triple quotes followed by new lines, or something mixed like this:
var rawString = @"""
This always starts on a "new line".
And also ends on a new line with the `surrounding symbols`.
"""@
But I guess this has its problems as well, since there's no way to avoid the CR/CRLF
in the last line.
One more variant (Sanest so far):
var str = @`
Always starts on a new line. Nothing special simply uses @ followed by back-tick.
But end doesn't have the same restrictions.`;
var rawStrWithBackTick = @`delim
Again, starts on a new line. But uses `delim` as the delimiter since "`" is used here.delim`;
This also makes it very easy for the compiler to implement. @ followed-by backtick until new-line to get the delimiter.
@prasannavl No, using any specific character as the terminator will require the escaping of that character. In your case you'd need a way to escape back-ticks, which do legitimately appear in strings, such as with quoted identifiers in MySQL-dialect SQL and in nHibernate XML mappings. My syntax avoids requiring (or supporting) any form of escaping since the developer can decide on the sequence of characters which terminate the raw string and as long as that sequence (and suffix) doesn't appear within the string they can include literally any text they want. As noted, this is how C++ tackled raw strings, so I think that it sets a decent precedent.
@HaloFour - I believe you responded to my initial draft comment which I shouldn't have posted - it indeed was quite flawed. My apologies for that. :)
I had updated it - quite significantly infact.
@prasannavl Oh, ok. Really you just replace parenthesis with back-ticks. Functionally they'd be identical. Doesn't really matter that much to me. The parenthesis route is (mostly) borrowed from C++. I think I prefer it to back-ticks but that's just a personal preference.
I guess that works too. Just that it feels odd on C# land to me, where there would some random delimiter word that seems exposed without the graceful protection bubble of somekind of a quote.
It also adds complexity to the language grammar. Static language grammars for say, syntax coloring might have to do a lot more work - may even be impossible in some, but very easy to implement when under backticks.
For a rather different syntax idea coming from the land of Perl 6:
Console.WriteLine(^"TERMINATOR");
Here are the contents of the heredoc.
Potentially multiple lines.
"TERMINATOR"
Console.WriteLine(^"another");
just getting the point out that the value for the terminator is user defined"another"
builds as
Console.WriteLine(@"Here are the contents of the heredoc.
Potentially multiple lines.
");
Console.WriteLine(@"just getting the point out that the value for the terminator is user defined");
That is: a string with a ^
before it starts and terminates with "
within that statement, but the literal begins on the start of the line following the immediate end of statement and continues until the original literal is repeated in quotes.
@bbarry
Same idea, conceptually.
I don't particularly care for the syntax, though. I think it would take me a few times rereading to realize that you weren't just printing TERMINATOR
. It also suffers from the same issue that I have with C++ raw strings in that the terminator appears within the body of the String which, in my opinion, makes it confusion to see where the string really begins. Compound that with the fact that perl will ignore any content between the terminator and the newline which I think could lead to confusion, either accidentally reading the remainder of that line as content, or accidentally having the remainder of that line intended as content ignored.
@bbarry I don't think making C# line-break sensitive like that would be a good idea. Especially if you consider how this affects lines with multiple statements (which is not that common, but is valid). For example, with this code:
Console.WriteLine(^"TERMINATOR"); i++;
Here are the contents of the heredoc.
"TERMINATOR"
If I understand your proposal correctly (maybe I don't, I haven't looked at how exactly does the Perl version work), then adding a line break before the i++;
statement (normally a safe style change) would change meaning of the code. I don't like that.
@svick yeah it would. I don't really have much of an opinion on it one way or the other, just stating another potential way to have raw strings embedded in C# which could be used to draw some inspiration from.
It looks particularly strange in LINQ statements when working with interpolation:
var result = from code in xelement.Elements("Employee")
let name = code.Element("Name").Value
let address = code.Element("Address")
let street = address.Element("Street").Value
let city = address.Element("City").Value
let state = address.Element("State").Value
let zip = address.Element("Zip").Value
let letterbox = $^"end"
orderby zip, name
select letterbox;
{name}
{street}
{city} {state}, {zip}
"end"
The issue I have which made me think about it was a call to Page.ClientScript.RegisterClientScriptBlock
in some webforms code, followed by an unfortunately large string of js and was missing , true
from the end:
Page.ClientScript.RegisterClientScriptBlock(redacted, redacted,
@"$('form').submit(function(){
... (way too many lines)
});");
I would use back-tick for the template string literal. This would be consistent with ES6 javascript as well. Check out node-edge, which can blend C# code into Node.JS and vice-versa: https://github.com/tjanczuk/edge
ES6
In ES6 you can use template strings to write multiline C# code.
var edge = require('edge'); var helloWorld = edge.func(` async (input) => { return ".NET Welcomes " + input.ToString(); } `); helloWorld('JavaScript', function (error, result) { if (error) throw error; console.log(result); });
Parity with existing languages, especially C based, makes it easier to cross pollinate.
@ericnewton76
C# already has template strings:
var s = $"Hello {person.Name}.";
var ms = $@"This is a multiline templated string literal,
Mr. {person.LastName}.";
Ok, here's a small set of exercises.
Results are:
{ }
.@
still need the incredibly common double-quotes to be escaped.Hence the need for actual string literal templates and a proposal to use the relatively uncommon backtick. How common is the backtick within strings? Not very.
Why does markdown use triple backtick for marking text as code? Because backticks are uncommon.
@ericnewton76
Backticks are the identifier qualifier in MySQL and in nHibernate. They're actually quite common.
This proposal is not for string templates. It is for string literals that contain no internal formatting or specific escape sequences. If you want to discuss an alternative syntax for string templates or string interpolation I'd suggest opening a new proposal. But I will state that since C# already has a syntax for this feature that the likelihood of it being considered is not good.
Ok, here's a small set of exercises. Try feeding the ES6 sample starting with edge.func into C# using that... Try pasting HTML code in there... Results are: It blows up..
This indicates a need to improve the compiler. It does not mean that these features need to be added to the language.
For example, in the past, the C# compiler had code to be resilient to people writing things like int foo[]
in it (which is not legal C#).
Similarly, i added support to the lexer/parser to be resilient to seeing git merge markers in the code. All this can be done with no language changes.
@CyrusNajmabadi
I believe "it blows up" is a euphemism for "it results in code that is unable to compile" since any such content pasted into a C# string requires additional escaping. It's possible that you could make the compiler more resilient to that so that the errors remain somewhat localized, but you wouldn't be able to solve the problem without further modifications.
Tooling could help here. When you paste content into a string in IntelliJ it automatically inserts whatever escaping is necessary. I believe you can also copy from a string and it removes that escaping.
But I still think that a raw string literal format would be helpful, especially when working with text in other editors or in raw source views like in Github. It was useful enough for C++ to add it.
I don't think I've ever ran into problems in Python with their triple quotes strings.
I'd vote for that approach.
@michael-hawker I can go with that... Python's triple quotes or Triple backtick or something...
Something is better than nothing. C# only really supports @
for string literals, and of course doesnt work with any html cut/pasted directly (when it has double quotes for attribute values)
Closing out: this issue was migrated over to https://github.com/dotnet/csharplang/discussions/89 4 years ago, and is now tracked in https://github.com/dotnet/csharplang/issues/4304 as a championed language proposal.
Problem:
Verbatim string literals are often used to copy in chunks of text such as SQL or XML. It's not uncommon for this text to contain embedded double-quotes, which then must be doubled-up in C# in order to be escaped. This makes it obnoxious if you have to copy and paste the text back and forth since it would have to be fixed each time.
Solution:
By supporting a custom notation for starting and ending a string literal it enables evaluating the double quotes as part of the string that do not have to be escaped. The programmer would use a pattern at the beginning of the string which then must be matched to denote the end of the string.
Proposed syntax would be based on verbatim string syntax:
raw-string-literal: @(" raw-string-literal-characters opt " ) @( identifier " raw-string-literal-characters opt " identifier )
raw-string-literal-characters: Any characters except " identifer opt )
The
@
would be immediately followed by an open-parenthesis ((
). Following that is an optional delimiter which is any legal C# identifier and then a double-quote. Next would be the raw string contents which would be interpreted exactly as they appear in the file. The terminator of the string literal would be a double quote immediately followed by the optional delimiter and a close-parenthesis ()
).The following example is a raw string without a custom delimiter. It is terminated simply by the sequence
")
. Double quotes that appear within the string are interpreted as a part of the string as long as they are not followed by a close-parenthesis character ()
).The following example is a raw string with a custom delimiter. As the custom delimiter is specified as "foo" that sets the terminating sequence to be
"foo)
. Any other characters are interpreted as a part of the string literal, including any portion of the terminating sequence. Note that the term "foo" appears within the string after a double-quote, but since it is not also followed by a close-parenthesis ()
) it is not considered the terminating sequence of the literal.In this example the custom delimiter is "foo". Note that the attribute value is not considered as the end of the string. However, if it was followed by a parenthesis it would be and a different custom delimiter might have to be chosen.
Here are examples of raw strings containing the syntax for other raw strings:
This syntax is loosely based on C++11 raw string literals, although I personally don't care for the parenthesis or delimiter appearing within the body of the string. I'm definitely not married to this syntax and variations are welcome.
Existing UserVoice suggestion, using C++11-like syntax: Allow to have custom delimiters in raw string literal