dotnet / csharplang

The official repo for the design of the C# programming language
11.05k stars 1.01k forks source link

[Proposal]: Raw string literal #4304

Open CyrusNajmabadi opened 3 years ago

CyrusNajmabadi commented 3 years ago

Raw string literal

Summary

Allow a new form of string literal that starts with a minimum of three """ characters (but no maximum), optionally followed by a new_line, the content of the string, and then ends with the same number of quotes that the literal started with. For example:

var xml = """
          <element attr="content"/>
          """;

Spec: https://github.com/dotnet/csharplang/blob/main/proposals/raw-string-literal.md


Special thanks to @jnm2 for a deep review of this proposal

AartBluestoke commented 3 years ago

is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?

string exampleJson= $"""
                     {{
                         "name" = "{this.thingName}"
                     }}""";

(expecting that the answer is 'no - raw means raw with no interpolation' )

CyrusNajmabadi commented 3 years ago

is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?

I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.

The moment we allow things like { to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.

So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)

Tragen commented 3 years ago

In my opinion, Example 2 should trow an error and is confusing. The ending string literal must be in its own line. So the string doesn't end with a new line as also it doesn't start with one. If you want an empty line at the end, add an empty line. Perhaps this makes it also easier for the parser.

AartBluestoke commented 3 years ago

"Perhaps this makes it also easier for the parser." @Tragen agreed, that would also allow strings of quotes to appear mid-string, however how would you indicate if the block of text ends in a new line or not?

Tragen commented 3 years ago

Thats easy. Add an empty line

No empty line at the end. Last character in the string is >

var xml = """
          <element attr="contents">
            <body>
            </body>
          </element>
          """;

with empty line at the end

var xml = """
          <element attr="contents">
            <body>
            </body>
          </element>

          """;

For me, that is much more intuitive and logical

merarischroeder commented 3 years ago

is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?

I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.

The moment we allow things like { to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.

So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)

Actually, I believe the escalating problem is about double-quote marks specifically. Having a raw-string+interpolation should therefore be possible and useful for at least HTML,XML and non-C markup/languages, but this is something that can be deferred for the future.

Examples:

example[0] = $"""<a href="{url}">{label}</a>"""
example[1] = $"""<tiger age="{age}"><eyes colour="{eye_color}" count="2"></tiger>"""

(I am using a single-line mode for these examples for brevity)

Examples with raw strings that would have braces:

var templateName = "C# Example Generator";
$"""
void Example(string Name)
{{
    Console.WriteLine($"Hello {{Name}}, Welcome to {templateName}");
}}
"""

Although the braces still need escaping, the ability to include raw double-quotes makes this much easier to read.

merarischroeder commented 3 years ago

is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?

I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.

The moment we allow things like { to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.

So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)

I don't see any conceptual paradox in having single-line and multi-line raw string literals with de-indentation.

example[0] = """raw string here"""; //closing quote is found on the same-line, so there is no multi-line processing to do
example[1] = """multiline string here
                with no de-indentation, because 
                the string opener was not followed by new-line""";
example[2] = """
                this string can be de-indented
                because the string opener
                was directly followed by new-line"""; //it shouldn't matter if the string closer is here, or on the following line, the first line's indentation is the reference-point.
example[3] = """
                this also means, that indentation
                       can increase above the base-line
                       the same amount of spaces are 
                       still removed according to the base-line
"""; //even if the string closer has zero indent

Perhaps it isn't impossible to implement, but it would be much more complex or the spec-system isn't flexible enough?

HaloFour commented 3 years ago

@Tragen

In my opinion, Example 2 should trow an error and is confusing. The ending string literal must be in its own line. So the string doesn't end with a new line as also it doesn't start with one. If you want an empty line at the end, add an empty line. Perhaps this makes it also easier for the parser.

I disagree. There's nothing confusing about the closing quote being on the same line. It's exactly how text blocks work in Python and Java and it is not a problem in either of those languages.

Tragen commented 3 years ago

@HaloFour A lot of other languages disagree with you. When you can have it on the same line, then you would have an empty line at the beginning in all of the examples in the first post.

HaloFour commented 3 years ago

@Tragen

Other languages are welcome to do what they wish, but given two major languages have adopted the behavior proposed above it demonstrates that there is nothing inherently confusing about it.

Tragen commented 3 years ago

Because major languages have some features doesn't automatically mean that it isn't confusing. E.g. C++ is very confusing.

HaloFour commented 3 years ago

@Tragen

Many different ways to skin that cat. To be honest I kind of prefer C++'s general approach to raw strings over text blocks since you're given a lot of flexibility to customize the delimiters while still retaining the syntax of a string (unlike heredocs in many languages). See the syntax I originally proposed here: https://github.com/dotnet/csharplang/discussions/89

I will admit that having the closing delimiter on a separate line does make it easier to control the indentation without including that final newline character, and Cyrus was a little surprised that Java does include that newline when the delimiter is on the next line (so does Python).

CyrusNajmabadi commented 3 years ago

Because major languages have some features doesn't automatically mean that it isn't confusing.

It does help with the argument though. Ultimately, either approach will need to be learned. Given that this doesn't really seem to have been a problem for many other languages, I'm not too worried for us. That said, I'm certain we'll discuss that option when we design this.

CyrusNajmabadi commented 3 years ago

For me, that is much more intuitive and logical

I'm certain we'll discuss this during the design process.

CyrusNajmabadi commented 3 years ago

I don't see any conceptual paradox in having single-line and multi-line raw string literals with de-indentation.

I'm certain we'll discuss this during the design process.

CyrusNajmabadi commented 3 years ago

Although the braces still need escaping,

We'll likely discuss this. Though I'm personally against it. It will depend on what he rest of the ldm wants here.

Needing to escape defeats the purpose here. Once you have to escape something, you're back where you started. The goal of these strings was to allow you to embed any content and not have to deal with escaping at all.

YairHalberstadt commented 3 years ago

There's a conflation of two different issues here:

  1. Supporting the ability to define raw straw string literals which require no escaping.
  2. Trimming indentation whitespace from literals.

I don't see that they necessarily have to come packaged together.

For example I would often want to indent interpolated strings as well.

It's also not clear how often raw literals have to be constants, and can't afford the overhead of calling something like .TrimIndentation() on them. I imagine the main use case would be tests, where such overhead would be marginal.

CyrusNajmabadi commented 3 years ago

It's also not clear how often raw literals have to be constants, and can't afford the overhead of calling something like .TrimIndentation() on them

My position is that that's what would be wanted the majority of times. As such, doing it by default should just be how the language works. Why foist it on the user to have to add that extra work when it can just be the default oob behavior?

CyrusNajmabadi commented 3 years ago

I don't see that they necessarily have to come packaged together.

They don't. But if we do raw strings this, I think we might as well do both to allow the literals to be ergonomically formatted without any downsides.

I'm sure though that we'll discuss this in the design meetings.

HaloFour commented 3 years ago

@YairHalberstadt

Java went through a similar design process and initially considered them separate with the inclusion of a helper method to align and trim the incidental whitespace. That was found to be more confusing and unattractive. Furthermore, since the helper method at runtime had less information regarding the formatting of the source around the string it ended up being necessary to include sentinel characters within the String to help inform it as to where the margin was supposed to be.

See: https://openjdk.java.net/jeps/326

I agree with Cyrus, the margin trimming is the most common thing you'd want to do and it's trivial to manage how the compiler behaves by the positioning of the delimiters. IDEs can include visual hints as to where the margin will be (as IntelliJ does with Java).

image

CyrusNajmabadi commented 3 years ago

IDEs can include visual hints as to where the margin will be

Yes. I intend to do this as part of the implementation.

jnm2 commented 3 years ago

This is great. By the time I got to the examples, they were already doing everything I intuitively wanted them to be doing. The indentation removal (or lack of indentation inclusion) is excellent and I would like to use it for things like EF/Dapper SQL queries.

I like the fact that you can explicitly include or exclude an ending newline by putting """ on the same line as the last line vs putting it on the next line. If there was a totally blank line before the ending """, I would strongly intuit that there would be two ending newlines. On the other hand, I could get used to anything. A newline is excluded at the top every time already.

There are a bunch of cases where I'd love to be able to use interpolation together with not having to escape double quote characters. For example: https://github.com/nunit/nunit3-vs-adapter/blob/master/src/NUnit.TestAdapter.Tests.Acceptance/SinglePassingTestResultTests.cs#L47-L60 Using raw strings without interpolation just for the benefit of excluding indentation and not having to escape quotes is probably something that would be quite hard to read if you have to inject values.

jairbubbles commented 3 years ago

Any thoughts on line endings? I once saw a case where a multi-line text was used in a unit test. As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.

I personnally think it would be cool to have a way to specify what line endings the string should have and not rely on line endings of the file itself.

IanYates commented 3 years ago

Any thoughts on line endings? I once saw a case where a multi-line text was used in a unit test. As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.

I personnally think it would be cool to have a way to specify what line endings the string should have and not rely on line endings of the file itself.

This is worthy of consideration. Some prefix sign ahead of the string (as we have $ and @ now) perhaps? I was also thinking a lot of "tabs to/from spaces" converters may need to get smarter here too. Having the IDE clearly indicate the common indent and show if it has a mix of tabs and spaces in it would be very helpful.

Allowing string interpolation seems reasonable to me. This does reduce the "can paste anything" ability, and more makes it an easier way to include blocks of text with quotes in them. However that seems a reasonable trade-off as it's very opt-in (only works if user places the $ in front)

Finally, the original proposal to have the closing quotes on their own line seems sensible to me. Imagine overwrite-pasting a good chunk of text - so much easier to select whole lines than to select many lines and then all bar the last N characters of the last line. I would much prefer to have the closing quotes on their own line.

CyrusNajmabadi commented 3 years ago

Any thoughts on line endings?

I would preserve them as is. It's intentionally a raw string, not an interpreted one. :-)

As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.

Sounds like a problem for all strings. Don't do that :-D

SimonCropp commented 3 years ago

would preserve them as is.

so different behavior based on what OS the code is build on?

CyrusNajmabadi commented 3 years ago

so different behavior based on what OS the code is build on?

No. I would preserve them as is. So whatever the contents of the file are. Do not use auto-crlf. It's unnecessary and outright broken.

The two ecosystems have tools that are fine with either line ending. Having your source control tool messing with this just isn't a good idea.

IanYates commented 3 years ago

so different behavior based on what OS the code is build on?

No. I would preserve them as is. So whatever the contents of the file are. Do not use auto-crlf. It's unnecessary and outright broken.

The two ecosystems have tools that are fine with either line ending. Having your source control tool messing with this just isn't a good idea.

I agree with auto-crlf being dodgy, although it is very popular.

Visual Studio itself suggests to me occasionally that I have mixed line endings in files and suggests to fix them. I suppress that message - we're a small team and 99% of the time just use Visual Studio. However those mixed line endings still sneak in and it'd be tricky if they were to cause confusion in things like string lengths being different.

For my use cases that come to mind it wouldn't hurt me but being sure of it would be nice.

CyrusNajmabadi commented 3 years ago

Visual Studio itself suggests to me occasionally that I have mixed line endings in files and suggests to fix them. I suppress that message - we're a small team and 99% of the time just use Visual Studio. However those mixed line endings still sneak in and it'd be tricky if they were to cause confusion in things like string lengths being different.

Honestly, if you have intentional mixed newlines, the right solution there IMO is simply to be explicit with line endings. i.e. actually use real escapes like \r\n. This is explicit, safe, and always resilient to whatever tooling you have in your development stack.

vbcodec commented 3 years ago

can you include ability to add designator of language ?

var s = """SQL
select * from TX
""";

then tooling (VS) will colorize content, and maybe other features.

What if content include many " ? Maybe better is to use combinatorial explosion to make delimiters shorter:

var s = """3
""""""""""1""""""""2222""""""""""
"""3;

Delimiter """ with following number, can;t be included in content, It is way better than """"""""""".

alrz commented 3 years ago

can you include ability to add designator of language ?

I think tooling will be able preserve that setting for strings anyways. R# doesn't need any clue in the source to make that happen.

vbcodec commented 3 years ago

R# doesn't need any clue in the source to make that happen.

SQL won't be included in string, it is only for C# tooling

CyrusNajmabadi commented 3 years ago

can you include ability to add designator of language ?

No. And that wouldn't work if we want to allow text at that position.

then tooling (VS) will colorize content, and maybe other features.

We already have ways to do that that would not require the designator the string.

What if content include many "

I don't see that occurring in practice. So no real need to deal with that speculative case.

vladd commented 3 years ago

I wonder what is going to happen if the prefixes mix tabs and spaces.

Consider the following code (. means space, # means tab):

string s = """
........first line
##second line
........""";

In an editor with default settings (tab = 4 spaces) the first line and second line would look aligned, but the language shouldn't know what the tab-length-in-spaces is, right?

HaloFour commented 3 years ago

@vladd

I wonder what is going to happen if the prefixes mix tabs and spaces.

Assuming the compiler considers tabs at all they should be considered a single whitespace character irrespective of any editor feature that renders them as any specific number of spaces. Mixing whitespace is a bad idea in general and I'd recommend that the editor warn when it encounters mixed whitespace in a text block as well as visually indicate where the effective margin would be calculated.

CyrusNajmabadi commented 3 years ago

would look aligned

The spec deals with this. The common Whitespace prefix here would be the empty string. This would be readily apparent in the ide due to how we'd visualize the left margin of the string value.

vladd commented 3 years ago

@CyrusNajmabadi

The spec deals with this. The common Whitespace prefix here would be the empty string. This would be readily apparent in the ide due to how we'd visualize the left margin of the string value.

Ok, you've escaped the need to recalculate tabs into spaces in that case, but what about another case?

string s = """
........first line
#........
........third line
........""";

How many spaces would the blank line produce?

HaloFour commented 3 years ago

@vbcodec

can you include ability to add designator of language ?

Editors can already provide syntax highlighting of a language within a language without requiring that the language accommodate this hint.

Although, I could see the ability to put a designator there as an interesting alternative to a variable number of double-quotes when it comes to specifying the terminal delimiter:

// no additional designator
var block1 = """
             Hello!
""";

// additional designator is "
var block2 = """"
             Hello!
"""";

// additional designator is @
var block3 = """@
             Hello!
@""";

// additional designator is SQL
var block4 = """SQL
             Hello!
SQL""";

That said, supporting such custom designators is probably overkill.

yaakov-h commented 3 years ago

Roslyn already has language hints e.g. /* lang=regex */ for syntax highlighting purposes.

CyrusNajmabadi commented 3 years ago

How many spaces would the blank line produce?

The algorithm does not use blank lines to determine the Whitespace prefix. So the prefix here would just be "........". We then trim that prefix from every line of we can. We can't remove it at all from the blank line. So the blank line would be inserted verbatim into the final string value.

vladd commented 3 years ago

So the blank line would be inserted verbatim into the final string value.

This seems to be quite counter-intuitive for the user. I'd suggest a warning (either in the compiler or at least in the IDE) if

CyrusNajmabadi commented 3 years ago

This seems to be quite counter-intuitive for the user.

I don't think intuition around tabs/spaces is ever going to be there.

I'd suggest a warning (either in the compiler or at least in the IDE)

That's fine. its out of scope for the design here IMO.

CyrusNajmabadi commented 3 years ago

the common prefix is shortened due to tab/space mismatch in different non-blank lines

I don't like this. Feels like some reasonable code could not be constructed without warnings. It seems fine, for example, for someone to want tabs to be way to indent, but then have spaces in the string that are not a common prefix.

TBH, i think you're honestly overthinking this. As stated above, i expect we'll have an IDE feature just draws a appropriate lines through all the indent levels. So you'll see what's going on. Just like not mixing LFs and CRLFs haphazardly, don't do the same for whitespace. And, if it really is a deep concern, then just have a style analyzer that says: don't do this in this repo plz. :)

vbcodec commented 3 years ago

Roslyn already has language hints e.g. /* lang=regex */ for syntax highlighting purposes.

didn't know it exist, and even work with VB :) Could be great if C# team will make it extensible, so everyone can make their own extension for colorize / check / autocomplete strings in desired language. This feature will make significant demand for such extensibility.

vladd commented 3 years ago

@CyrusNajmabadi

the common prefix is shortened due to tab/space mismatch in different non-blank lines

I don't like this. Feels like some reasonable code could not be constructed without warnings. It seems fine, for example, for someone to want tabs to be way to indent, but then have spaces in the string that are not a common prefix.

No, this case

string s = """
##|....indented line 1
##|....indented line 2
##|";

is not covered by my idea (I use | to denote the end of the common prefix). I'd rather warn about this:

string s = """
|##....indented line 1
|............indented line 2
|##";

which seems to be the source of surprise (why is the vertical line not at position 8? maybe VS is buggy. how would I know that I need to check all the line beginnings?). As well, I'd prefer to detect this easily:

string s = """
##|....indented line 1
............
##|....indented line 2
##|";

because the IDE wouldn't give me any visual clue that the whole 12 spaces are going to be in the blank line instead of the "obvious" 4 (obvious because when my caret is at the end of the blank line, it visually aligns with the beginning of the text in the indented lines).

CyrusNajmabadi commented 3 years ago

because the IDE wouldn't give me any visual clue that the whole 12 spaces are going to be in the blank line instead of the "obvious" 4

Yes, it would give a visual clue there. The ide will show the left string margin for every line.

So you would see:

string s = """
##|....indented line 1
|............
##|....indented line 2
##|";

I'd rather warn about this

As mentioned above, I expect warnings can be written for these cases by anyone interested. I don't see it as necessary as part of the language.

mpawelski commented 3 years ago

This is great proposal. I had something similar in my mind since the last time it came up (#4013).

General remarks to proposal

As I said, this is great proposal. Basically all examples does what I want them to do.

I also read how similar Java feature works. And I think it's good to mention what they do differently:

  1. Removing trailing white space from the end of lines.

Trailing white space is most often unintentional, idiosyncratic, and insignificant. It is overwhelmingly likely that the developer does not care about it. Trailing white space characters are similar to line terminators, in that both are invisible artifacts of the source code editing environment. With no visual guide to the presence of trailing white space characters, including them in the content would be a recurring source of surprise, as it would affect the length, hash code, etc, of the string.

Please don't do this. This feels like compiler is doing to much magic behind my back. I think it's a terrible idea. I just bring it up here to ask you to not do this. 😉

On more serious note. I would expect that if I copy some text file and put it inside "raw string literal" and then indent it to match my nesting of code, the string should be exactly the same like it is in the the text file I copied it from. No more magical removal of trailing whitespaces.

  1. Normalizing line ending to '\n' ()

Line terminators in the content are normalized from CR (\u000D) and CRLF (\u000D\u000A) to LF (\u000A) by the Java compiler. This ensures that the string derived from the content is equivalent across platforms, even if the source code has been translated to a platform encoding (see javac -encoding).

I think we have no other choice here than to keep the actual line ending as it is. For consistency sake. We already don't do any special handling for line terminators when we use @" verbatim string. Neither we should do it here.

But I can imagine scenarios when it would be helpful. Like already mentioned situation when line ending differs because of different git's "autocrlf" setting. "autocrlf" seems like a solution to a no problem for me (all text editors that I used on Windows seems to handle '\n' without issues) but it is very popular and it's the default setting for "git for windows".

What if would have compiler switch for it?

If you could set this on solution level (for example with Directory.build.prop) it would be nice addition to ensure that code compiles the same way on every environment. Without having to explain every person in team how git's "autocrlf" is unnecessary.

    <Project>
        <PropertyGroup>
            <Deterministic>true</Deterministic>
            <LineEndingForStringLiterals>LF</LineEndingForStringLiterals>
        </PropertyGroup>
    </Project>

I included deterministic flag here just as an example of other setting that helps in ensuring consistent build results on all environments.

Single line variant

I don't see any reason to not have single line variant. The example with regex that include " character is already mentioned in proposal.

Please do single line variant of this new "raw string literal syntax". What you wrote in proposal seems to be the right thing to do:

We could consider relaxing this and allowing single line forms like: var regex = """("|')*""";. In this case, we would simply not allow newlines in the middle. Or, if we allowed newlines, we would do no dedenting logic.

This would probably make verbatim string a bit obsolete*. But this is how language evolve. We still have anonymous delegate syntax and I have newer seen it being used in real code.

*people would probably still use it for the common case of passing windows file paths since they just want to avoid escaping \.

String interpolation.

I would really like to be able to use string interpolation for this new "raw string literal" syntax. And I agree that we would run into escaping problem again with it (having to write those ugly {{ and }}).

My proposal is to improve current string interpolation feature.

Allow having different {expression} syntax based on number of $ you wrote at beginning of string literal. For one $ you would use just {expression} for interpolation, just as you do now. For $$ it would be ${expression}. For $$$ it would be $${expression}, etc. You could use this improved syntax for all kind of string literals (standard syntax (" "), verbatim (@" ") and this new proposed raw string literal (""" """).

As and example from my real world codebase I have 3rd party library that has its own templating syntax which itself has syntax to include arbitrary JavaScript where you need to escape { with {{. So when I use string interpolation I have to write {{{{ 😵.

github_csharplang_string_raw_literal_img1

Imagine if I could write it like this (imagine it's nicely colored by IDE):

image

Here I used this "improved" string interpolation syntax on new "raw string literal" ($$"""). But in this case I could use it on verbatim string (@") as well since I don't care about removal of indentation and can use ' instead of " in JavaScript's string literals. But I would prefer to use new "raw string literal" syntax because I don't want to even think if I can use ' or " and whitespace indentation removal is nice addition to have.

Here's another contrived example that shows how we wouldn't have to escape ${expression} syntax from JS's template literal and to interpolate variable in C# we could write $${expression} because we can mark string with three $:

    var foo = "FOO",
    var jsCodeGenerated = $$$"""
                          var jsVariable = "$${foo}";
                          console.log(`
                            jsVariable is: ${jsVariable}
                            Look ma, I generated JS code with template literals.
                            Using C# string interpolation and raw literal syntax!.`);
                          """

However if this "improved" string interpolation is "too much" then allowing people to use current string interpolation would be fine as well. You still have this {{ escaping problem but you have to opt in to it with $ anyway. And it would make language more consistent to be able to use string interpolation for all string literals syntaxes.

Summary

I think there are 3 goals that we want to achieve (two already mentioned by @YairHalberstadt)

  1. Supporting the ability to define raw string literals which require no escaping.
  2. Trimming indentation whitespace from literals.
  3. Possibility to interpolate string without having to deal with escaping issues (avoid having to add all those ugly {{ and }}})

Current proposal with my string interpolation proposal achieve all this goals. They can be thought about (and implemented) separately. But as a whole I think they will greatly improve "dealing with strings" story in C#.

HaloFour commented 3 years ago

@mpawelski

Single line variant

The newline is necessary to terminate the string delimiter. Without it it's impossible to know whether the following is an empty string or a string containing two double quotes:

var text = """""""";

String interpolation

The purpose of raw strings is to eliminate processing of the string by the compiler. Adding interpolation into that seems like a contradiction.

merarischroeder commented 3 years ago

@HaloFour says

Without it it's impossible to know whether the following is an empty string or a string containing two double quotes:

It's possible.

var text = ""; is an empty string

var text = """ """; is a raw string with a single space character

I don't see a problem. Do you have an edge case in mind?

The purpose of raw strings is to eliminate processing of the string by the compiler.

Not according to the proposal above. This isn't about saving processing time for the compiler. There is the proposal thread about and the previous thread that led up to this proposal thread.

Adding interpolation into that seems like a contradiction.

It depends on the goals. The goals and use-cases for strings should be guiding decisions.

Here is a raw string:

var text = """<animal name="greg"><tiger></tiger></animal>""";

Here is a raw string with interpolation

var text = $"""<animal name="{animal.Name}"></animal>""";

Interpolation on a raw base string is not a contradiction, it's useful.

Here is the interpolation without raw base string support:

`var text = $"""<animal name=\"{animal.Name}\">""";

Code is easy to write, but harder to read. Readability is a very important goal for a programming language.

merarischroeder commented 3 years ago

@mpawelski

I like everything about your post - thanks!

About Java's reasoning:

Trailing white space is most often unintentional, idiosyncratic, and insignificant.

To be clear, in the Java spec they do remove the baseline indentation.

For readability, I think it's best to follow the same standards as other languages. What does Python do? I suspect they all discard the whitespace indentation baseline. Readability is important because that's harder to do than writing.