dotnet / csharplang

The official repo for the design of the C# programming language
11.5k stars 1.03k forks source link

[Proposal]: Remove restriction that interpolations within a non-verbatim interpolated string cannot contain new-lines. #4935

Open CyrusNajmabadi opened 3 years ago

CyrusNajmabadi commented 3 years ago

Allow new-lines in all interpolations

Summary

The language today non-verbatim and verbatim interpolated strings ($"" and $@"" respectively). The primary sensible difference for these is that a non-verbatim interpolated string works like a normal string and cannot contain newlines in its text segments, and must instead use escapes (like \r\n). Conversely, a verbatim interpolated string can contain newlines in its text segments (like a verbatim string), and doesn't escape newlines or other character (except for "" to escape a quote itself).

This is all reasonable and will not change with this proposal.

What is unreasonable today is that we extend the restriction on 'no newlines' in a non-verbatim interpolated string beyond its text segments into the interpolations themselves. This means, for example, that you cannot write the following:

var v = $"Count is\t: { this.Is.A.Really()
                            .That.I.Should(
                                be + able)[
                                    to.Wrap()] }.";

Ultimately, the 'interpolation must be on a single line itself' rule is just a restriction of the current implementation. That restriction really isn't necessary, and can be annoying, and would be fairly trivial to remove (see work https://github.com/dotnet/roslyn/pull/54875 to show how). In the end, all it does is force the dev to place things on a single line, or force them into a verbatim interpolated string (both of which may be unpalatable).

The interpolation expressions themselves are not text, and shouldn't be beholden to any escaping/newline rules therin.

Specification change

single_regular_balanced_text_character
-    : '<Any character except / (U+002F), @ (U+0040), \" (U+0022), $ (U+0024), ( (U+0028), ) (U+0029), [ (U+005B), ] (U+005D), { (U+007B), } (U+007D) and new_line_character>'
-    | '</ (U+002F), if not directly followed by / (U+002F) or * (U+002A)>'
+    : <Any character except @ (U+0040), \" (U+0022), $ (U+0024), ( (U+0028), ) (U+0029), [ (U+005B), ] (U+005D), { (U+007B), } (U+007D)>
+    | comment
    ;

LDM Discussions

https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-09-20.md

DanielRosenwasser commented 3 years ago

FYI, you can use diff as the code fence language, so

```diff
 single_regular_balanced_text_character
-     : '<Any character except / (U+002F), @ (U+0040), \" (U+0022), $ (U+0024), ( (U+0028), ) (U+0029), [ (U+005B), ] (U+005D), { (U+007B), } (U+007D) and new_line_character>'
-     | '</ (U+002F), if not directly followed by / (U+002F) or * (U+002A)>'
+     : <Any character except @ (U+0040), \" (U+0022), $ (U+0024), ( (U+0028), ) (U+0029), [ (U+005B), ] (U+005D), { (U+007B), } (U+007D)>
+     | comment
     ;

will render as

```diff
 single_regular_balanced_text_character
-     : '<Any character except / (U+002F), @ (U+0040), \" (U+0022), $ (U+0024), ( (U+0028), ) (U+0029), [ (U+005B), ] (U+005D), { (U+007B), } (U+007D) and new_line_character>'
-     | '</ (U+002F), if not directly followed by / (U+002F) or * (U+002A)>'
+     : <Any character except @ (U+0040), \" (U+0022), $ (U+0024), ( (U+0028), ) (U+0029), [ (U+005B), ] (U+005D), { (U+007B), } (U+007D)>
+     | comment
     ;
CyrusNajmabadi commented 3 years ago

thanks!

alrz commented 3 years ago

from LDM:

We have a lot of compiler complexity around ensuring interpolated strings do not have a newline in them

Does the same reasoning apply to allowing ternaries in interpolation holes? I think there's already a proposal to follow TS design to accomplish that.

CyrusNajmabadi commented 3 years ago

ternaries are harder without a substantial amount of effort here.

liamlim commented 3 years ago

Is there any chance to look at this again? https://github.com/dotnet/csharplang/issues/3414

I see EVERYBODY starting interpolating strings by adding a space but you can't do it if you decide to format DateTime to a specific format. It's really annoying issue. If you decide to start interpolation with a space as everybody does then you need to write ugly code like this:

var a = new DateTime();
var b = $"Value '{ a:yyyy-MM-dd}'"; // no trailing space in interpolated string => legal
davhdavh commented 2 years ago

Would it be possible to tell the compiler which line endings to use since right now it is very hard to use multiline strings except if you add additional code to transform newlines.

jnm2 commented 2 years ago

@davhdavh Maybe open a new discussion on this? This thread is only talking about newlines inside interpolation holes, not inside the string contents, e.g.:

var x = $@"a{
    null
    }b";

which results in the string ab.

davhdavh commented 2 years ago

Ahh, my mistake. I totally miss-read the spec, and thought this would also allow newlines in the interpolated string itself.

phizch commented 2 years ago

@davhdavh One cool thing with this change is that you can do this:

const string NL = "\r\n"

string res = 
$"This is {NL
}a string with multiple{NL
}lines!!"

There's also Raw string literal which is available with EnablePreviewFeatures enabled, but that wouldn't solve the line ending problem.

Maybe a combination, at least it'll take care of the indention.

string res = $"""
             This is{NL
             }a string with multiple{NL
             }lines!!
             """;