dotnet / csharplang

The official repo for the design of the C# programming language
11.53k stars 1.03k forks source link

[Proposal]: [Indented Verbatim String] #4013

Closed Mafii closed 1 year ago

Mafii commented 4 years ago

Indented Verbatim String

Summary

Motivation

When writing multiline strings, I have often come to a point where I want to indent the text on the newline, so that it doesn't cannibalize then indentation of the class.

Example of cannibalized indentation with verbatim strings:

namespace Test
{
    public class TestClass
    {
        public string SomeTestString => @"this is a string,
its content is spread over multiple lines,
and this can make it hard to read the structure of the class,
because the indentation is simply gone!";
    }
}

We can fix the indentation with spaces in the string, but this has a huge downside: the string will contain spaces for no reason!

namespace Test
{
    public class TestClass
    {
        public string ThisIsBad => @"this is a string,
                                    its content is spread over multiple lines,
                                    and its content contains unnessecary whitespace,
                                    but at least its easy to read in the code?";
    }
}

Detailed design

I propose the use of the pipe (|) as the identifier of a indented (verbatim) string, when used in combination with the @, the identifier for a verbatim string.

Thus, a empty indented verbatim string would look like that:

string test = @|"";

Just like the $ sign for interpolated strings can be used with @, the pipe can be used in combination with @ (required) and $ (optional). The | may not start the group of identifiers in front of the string. This means that while @|"" is valid, |@"" is not. This makes it easier to avoid problems with the binary operator |, that can be valid in different contexts. While any ordering for $ and @ are valid, as they can be used on their own, it makes more sense to see @| as the only valid order, as they only work together, and | can not exist on it's own. The proposed feature name (Indented Verbatim String) therefore contains the meaning of both keywords.

Inside a indented verbatim string, we can use the | to set indentation of the line, resulting in this:

namespace Test
{
    public class TestClass
    {
        public string SomeTestString => @|"this is a string,
                                         |its content is spread over multiple lines,
                                         |and it is easier to read the structure of the string
                                         |because the indentation is the same as the initial line!";
    }
}

The compiler rewrites the code to the equivalent to:

namespace Test
{
    public class TestClass
    {
        public string SomeTestString => @"this is a string,
its content is spread over multiple lines,
and it is easier to read the structure of the string
because the indentation is the same as the initial line!";
    }
}

If a developer wants to use the pipe (|) as a character of the string, he can escape it with \| or || (up to discussion).

Drawbacks

Problem 1: Tabs / Whitespace fights are problematic: IF we require | to be on the same indentation on every line inside the string, how do we handle a mixture of tabs and whitespaces? (Probably just disallow that? Devs mixing tabs and whitespaces (for indentation and alignment) would probably be disappointed by this, but there is no clear way of doing it otherwise)

Problem 2: The behaviour of spaces after a pipe are not clear when reading the code. Do they introduce new whitespace?

e.g.:

public string SomeTestString => @|"this is a string,
                                 |its content is spread over multiple lines,
                                 |and it is easier to read the structure of the string
                                 |because the indentation is the same as the initial line!";

will the difference be clear to

public string SomeTestString => @|"this is a string,
                                 | its content is spread over multiple lines,
                                 | and it is easier to read the structure of the string
                                 | because the indentation is the same as the initial line!";

Problem 3: Do we encourage bad newline design here? Wouldn't Environment.Newline be better instead of having the newline character(s) of the code-base?

e.g.:

public string SomeTestString => $@"some text{Environment.NewLine
                                  }that can be indented, too.{Environment.NewLine
                                  }because if the newline is inside the brackets{Environment.NewLine
                                  }it behaves as if it weren't part of the string!";

is that the better solution to the multiline problem, and should this proposal be avoided so that developers aren't encouraged to ignore the platform dependent newline concern?

Alternatives

Of course, other characters could be used to indent the string. I don't know which one would fit better.

When using a Interpolated Verbatim string, brackets and newlines can be abused to achieve the same, see Problem 3 in Drawbacks

Unresolved questions

Unresolved question 1: Do we support different indentations of | on subsequent lines in a Indented Verbatim?

According to the proposal, the following code snipped would be valid:

public string SomeTestString => @|"this is a string,
                                 |its content is spread over multiple lines,
                                 |and it is easier to read the structure of the string
                                 |because the indentation is the same as the initial line!";

But is this code snipped valid, too?

public string SomeTestString => @|"this is a string,
                                 |its content is spread over multiple lines,
                         |and it is not easier to read the structure of the string
                                                   |because the indentation is the ALL OVER THE PLACE!";

And this?

public string SomeTestString => @|"this is a string,
                                  |its content is spread over multiple lines,
                                  |and it is easier to read the structure of the string
                                  |because the indentation is the same on every subsequent, but different fromt the initial line!";

And this?

public string SomeTestString => @|"this is a string,
                             |its content is spread over multiple lines,
                             |and it is easier to read the structure of the string
                             |because the indentation is the same as the initial line!";

Unresolved question 2: How do we escape the | character in the string?

Option 1: @|"\|" is equivalent to @"|" (similar to \\, \t, \n, etc.)

Option 2: @|"||" is equivalent to @"|" (similar to {{, }} and "")

I think option 1 is more intuitive.

YairHalberstadt commented 4 years ago

scala does this by having a trim() method on string:

public string SomeTestString => @"this is a string,
                                 |its content is spread over multiple lines,
                                 |and it is easier to read the structure of the string
                                 |because the indentation is the same as the initial line!".Trim();

Couldn't you write an extension method on string, which will strip whitespace for you?

If you want to avoid the allocation, you could cache it, or even use a source generator to return the correct string using the technique described here. The lack of CallerColumnNumber shouldn't matter here because there's at most one such string per line.

YairHalberstadt commented 4 years ago

If performance was really important, and you wanted to avoid any allocations, or locking, a source generator could generate the following:

using System.Collections.Generic;

internal static class StringExtensions
{
    private static Dictionary<string, string> _dic = new Dictionary<string, string>(ReferenceEqualityComparer<string>.Instance)
    {
        [@"this is a string,
                                         |its content is spread over multiple lines,
                                         |and it is easier to read the structure of the string
                                         |because the indentation is the same as the initial line!"] =
            @"this is a string,
its content is spread over multiple lines,
and it is easier to read the structure of the string
because the indentation is the same as the initial line!",
    };

    public static string Trim(string str) => _dic[str];
}

In practice though if performance was really important, you would just write your strings in the slightly uglier fashion, and skip all of this :-). Most of the time this would be used in code where having to format on the first use, and then use a concurrent dictionary from then on is an acceptable cost.

YairHalberstadt commented 4 years ago

BTW none of the above is to say this is a bad idea - it's a very reasonable suggestion. It's just it's best to explore all existing avenues before choosing to add a new language feature 😄

Flutterish commented 4 years ago

How about we just discard the first line in a @| string?

...
    var myText = @|" // this is not a part of the text
        |this is text
        |and so is this
        |and this"
...

or use the statements indentation and not the strings?

...
    var myText = @" // also not text. This whole issue exists because of the extra `"` after the `|`
    this is text
    and so is this
    and this"
...
HaloFour commented 4 years ago

Java's text blocks trim out the left margin but they determine it based on the whitespace around the declaration:

var withoutWhitespace = """
                        <html>
                            <body>
                                <p>Hello World</p>
                            </body>
                        </html>
                        """;

var withWhitespace = """
                     <html>
                         <body>
                             <p>Hello World</p>
                         </body>
                     </html>
""";
Mafii commented 4 years ago

@Flutterish I think statement indentation is somewhat flawed because you cant distinguish normal code from the second line of such a string when not using a color pattern from an ide (e.g. when you inspect a file with Notepad or on a website without support of colored strings.

What I dislike about the // this is not part of the text is that it's orthogonally to how strings adn " behave at the moment. I think it would be good for this feature if changing @ to @| in a codebase would not break the behaviour that a string starts with the first character after the ".

...
    var myText = @|" // this is not a part of the text, but some devs would expect it to be
        |this is text
        |and so is this
        |and this"
...
Mafii commented 4 years ago

@HaloFour thanks for the link. I think it adds a lot to the discussion.

I'm not sure what to think about that java feature - I like how clean it looks, and how easy it is to use, but the removed whitespace is non-explicit and non-intuitve at first, and especially hard to track by eye in strings spanning many lines.

The equivalent of this feature requests example using the java syntax would be


public string SomeTestString => """
                                this is a string,
                                its content is spread over multiple lines,
                                and it is easier to read the structure of the string
                                because the indentation is the same as the initial line!""";
HaloFour commented 4 years ago

I'm not sure what to think about that java feature - I like how clean it looks, and how easy it is to use, but the removed whitespace is non-explicit and non-intuitve at first, and especially hard to track by eye in strings spanning many lines.

The tooling experience does a good job of indicating the effective whitespace:

image

leandromoh commented 4 years ago

F# has backslash strings that indent string contents by stripping leading spaces.

let poem = 
    "The lesser world was daubed\n\
     By a colorist of modest skill\n\
     A master limned you in the finest inks\n\
     And with a fresh-cut quill."

also, have triple-quoted strings, what allow us don't even have to escape

let tripleXml = """<book title="Paradise Lost">"""

See it running on sharplab

I liked the approach for backslash strings and for triple-quoted strings I would just suggest enclose it in C# by using the backtick (`) (grave accent) character instead of triple double quotes for convenience.

Source: http://dungpa.github.io/fsharp-cheatsheet/

tacosontitan commented 4 years ago

Couldn't we, in theory simply update based on the column for the start of the string? For example:

public string testString = @|"This is an verbatim string that infers indentation based on the column of the start of the string.
                              This is line 2.
                                  This is line 3.";

The first character of line one in the string here is at column index (zero-based) 30. The second line also starts at column index 30, while the 3rd line begins at column index 34. In this case, if it is defined that adding the | allows the compiler to infer indentation based on the column of the first character in the string, then it should work just fine. The biggest issue I see is maybe compile time performance.

Flutterish commented 4 years ago

@tacosontitan Tabs do not play nice with spaces, that is 4 spaces ( 4 chars, 4 columns wide ) is not the same as 1 tab ( 1 char, variable width )

tacosontitan commented 4 years ago

@Flutterish couldn't we look at the user settings to determine if the user is utilizing tabs/spaces and the number of spaces each tab uses? I could swear both are environmental variables. Then I believe tabs show up as hidden characters \t? You could then convert to spaces as need based on those settings; but I could be entirely wrong here. Just trying to help 😀

dstarkowski commented 4 years ago

@Flutterish couldn't we look at the user settings to determine if the user is utilizing tabs/spaces and the number of spaces each tab uses? I could swear both are environmental variables. Then I believe tabs show up as hidden characters \t? You could then convert to spaces as need based on those settings; but I could be entirely wrong here. Just trying to help 😀

In VS use of tabs or spaces for indentation is configured either by VS settings or .editorconfig. As far as I'm aware compiler knows neither.

If compiler did what you're suggesting, the code would mean on my machine something else than it does on yours. C# was always whitespace insensitive and played well with both tabs and spaces. Changing it now would surely create friction.

jrmoreno1 commented 4 years ago

There is a developer community request for a similar feature, except that the indentation is VIRTUAL (only shows in the IDE), and thus no need for a language change. https://developercommunity.visualstudio.com/idea/602807/indent-multi-line-verbatim-strings.html?childToView=1230559#comment-1230559

Mafii commented 4 years ago

@jrmoreno1 the problem with a virtual feature in the IDE is that users using Visual Studio Code, JetBrains Rider, Vim, or even Notepad are left behind, which I think is sad (I personally havent used Visual Studio for a few months due to using Rider)

jrmoreno1 commented 4 years ago

@Mafii: yes, that’s a downside. But that is the upside as well — if you look at the code as just text, you see what the compiler sees. If it is done by a language change, I would also expect IDE changes as well for optimum results and you wouldn’t have those in vim or notepad.

Pyrdacor commented 3 years ago

How about trimming all tabs but not spaces. You can ident the string start with tabs and the indentation inside the string with spaces or \t. This way it would be not dependent on the IDE. Moreover Visual Studio or other editors could add a feature to start a newline of such a string with a specific amount of tabs. Win win for all.

var foo = """<!-- Test -->
____<html>
____    <head>
____    </head>
____</html>""";

Or

var foo = """<!-- Test -->
____<html>
____\t<head>
____\t</head>
____</html>""";

I used underscores to show actual tabs here.

Tragen commented 3 years ago

I don't like the idea of having \t or something inside the string, that goes against the intention of this request to copy paste a string without having to modify it.

theunrepentantgeek commented 3 years ago

How about trimming all tabs but not spaces. You can ident the string start with tabs and the indentation inside the string with spaces or \t.

I think you've just managed to offend everyone who cares about the spaces-vs-tabs debate - which is, within a rounding error, 100% everyone I've worked with who has had to work across multiple languages.

C# has never been a whitespace sensitive language. There are any number of teams who apply automatic formatting to their code, either routinely eliminating tabs in favor of spaces, or vice versa. Some do this to ensure conformity - and some do it to permit variance (allowing their developers to automatically reformat code however they like, secure in the knowledge that the changes aren't inflicted upon anyone else).

Making the language whitespace sensitive - a breaking change - after 20 years of widespread use and literally hundreds of billions of lines of code is simply a non-starter.

Pyrdacor commented 3 years ago

I don't like the idea of having \t or something inside the string, that goes against the intention of this request to copy paste a string without having to modify it.

Some other ideas suggested the pipe operator in front of each line which would be even worse. Adding the \t after pasting is one replace operation which is done in 2 seconds or in a blink of an eye if you put it on a shortcut in your editor. This could be also done automatically by an IDE feature when pasting into a surrounding verbatim string.

Pyrdacor commented 3 years ago

How about trimming all tabs but not spaces. You can ident the string start with tabs and the indentation inside the string with spaces or \t.

I think you've just managed to offend everyone who cares about the spaces-vs-tabs debate - which is, within a rounding error, 100% everyone I've worked with who has had to work across multiple languages.

C# has never been a whitespace sensitive language. There are any number of teams who apply automatic formatting to their code, either routinely eliminating tabs in favor of spaces, or vice versa. Some do this to ensure conformity - and some do it to permit variance (allowing their developers to automatically reformat code however they like, secure in the knowledge that the changes aren't inflicted upon anyone else).

Making the language whitespace sensitive - a breaking change - after 20 years of widespread use and literally hundreds of billions of lines of code is simply a non-starter.

Maybe do it like with nullables. Add a switch to specify one of 3 modes:

#whitespace space
#whitespace tab
#whitespace both

The last mode is the default and will be as it is today. Verbatim strings with indentation won't work with this active.

The other modes force a specific whitespace and the other one is not permitted. This could also help teams by getting compile errors or errors while typing if they use wrong whitespaces to ident. The IDE can support this as well. As it is an optional feature it won't break any code.

I never liked it that I can exchange tabs and spaces to my liking. A switch to get rid of it would be nice regardless of verbatim strings. But I guess this would be for a different proposal.

CyrusNajmabadi commented 3 years ago

Any change that makes C# whitespace-sensitive in this fashion will be voted against by me.

Add a switch to specify one of 3 modes:

We are definitely not adding dialects over something like string literals. It's not happening.

OJacot-Descombes commented 3 years ago

This could be solved by an IDE feature entirely, with no language change involved. I.e. the verbatim string would physically remain unindented, but the IDE would visually indent the continuation lines and indicate the effective whitespace with a vertical line. Viewed in Notepad the example from the Motivation section above would still look "cannibalized". Advantage: the feature would be immediately available in existing code.

I have proposed it here: Indent multi-line verbatim strings

lechu445 commented 3 years ago

There is also a proposal of multi-line indent in context of raw strings https://github.com/dotnet/csharplang/issues/4304

rabadiw commented 3 years ago

What about this use case?

[Fact(DisplayName = @"
    As <persona>,
    I want <what?>
    So that <why?>.")]
public void Test() { }

It would be great to remove whitespace and replace line breaks with a single space. The result would be readable code, cleaner readable test output report on the cmd line, and cleaner and readable output in the IDE unit test tools panels.

CyrusNajmabadi commented 3 years ago

The result would be readable code

This doesn't look readable to me (imo). That all looks like there woudl be newlines there. Having those newlines become spaces doesn't seen intuitive to me at all.

rabadiw commented 3 years ago

Let me clarify. I left a few things open to discussion as not to influence a direction on the approach just yet. The example listed earlier was to prove a use case.

As of today, the output of [Fact] is as the attachment shows (with spaces and \n). However, it would be more ideal to be able to produce the output as As <persona>, I want <what?> So that <why?>.

rider_unit_tests_ex_output

The C# language does not have methods to parse/transform a multiline string. None as described in earlier comments. As in the example listed before, an attributes adds a layer of complexity that constraints to what we can do.

CyrusNajmabadi commented 3 years ago

I think it would be easier to read and understand as:

[Fact(DisplayName = @"As <persona>, I want <what?> So that <why?>.")]
public void Test() { }
cryolithic commented 3 years ago

Why not allow for classic C style concatenation?

var someString = "This string will"
                 "be concatenated"
                 "at Compile Time.";

Now in C we would need to explicitly add the newline, but I see no reason to be bound by that convention here.

This also prevents making people like me who understand the one true way really annoyed.

rabadiw commented 3 years ago

I think it would be easier to read and understand as:

[Fact(DisplayName = @"As <persona>, I want <what?> So that <why?>.")]
public void Test() { }

Preference is a highly debatable topic and a meaningless endeavor. Would this debate be any different on the topic of comments allowing single vs multi lined?

Options here is the key!

It would be nice to have, a margin trim with prefix & postfix. Expanding on the capabilities introduced by Kotlin TrimMargin

tacosontitan commented 3 years ago

Couldn't this be accomplished by simply starting on the next line? You'll sacrifice two characters of placement for the first line, but everything else lines up:

private string _verbatim =
@"something
is awesome about this.";
HaloFour commented 3 years ago

The raw strings proposal which has been referenced a couple of times here already includes automatic trimming of the string literal, done by the compiler at compile-time.

Mafii commented 3 years ago

@tacosontitan I disagree, my main reason for disliking verbatim strings is that if I use them in a code block, the block suddenly has 0 indentation instead of the same one the rest of the code has. It makes reading through the rest of the file harder as it seems like it's top level, but it isn't at all.

So your solution doesn't solve my problem at all. :)

CyrusNajmabadi commented 3 years ago

As mentioned, the raw string literals proposal handles this case.

tacosontitan commented 3 years ago

@tacosontitan I disagree, my main reason for disliking verbatim strings is that if I use them in a code block, the block suddenly has 0 indentation instead of the same one the rest of the code has. It makes reading through the rest of the file harder as it seems like it's top level, but it isn't at all.

So your solution doesn't solve my problem at all. :)

Not trying to be argumentative at all, but if the verbatim string is distracting, you can use region directives to show and hide it as needed:

#region Verbatim String

    private string _verbatim =
@"verbatim
verbatim
something
else";

#endregion

However, I concur that this would be a nice feature, and I agree with the linked raw strings proposal 😄

roji commented 1 year ago

Can this be closed now that raw strings literals have been introduced? They solve the indentation problem.