Closed Mafii closed 1 year ago
scala does this by having a trim()
method on string:
public string SomeTestString => @"this is a string,
|its content is spread over multiple lines,
|and it is easier to read the structure of the string
|because the indentation is the same as the initial line!".Trim();
Couldn't you write an extension method on string, which will strip whitespace for you?
If you want to avoid the allocation, you could cache it, or even use a source generator to return the correct string using the technique described here. The lack of CallerColumnNumber shouldn't matter here because there's at most one such string per line.
If performance was really important, and you wanted to avoid any allocations, or locking, a source generator could generate the following:
using System.Collections.Generic;
internal static class StringExtensions
{
private static Dictionary<string, string> _dic = new Dictionary<string, string>(ReferenceEqualityComparer<string>.Instance)
{
[@"this is a string,
|its content is spread over multiple lines,
|and it is easier to read the structure of the string
|because the indentation is the same as the initial line!"] =
@"this is a string,
its content is spread over multiple lines,
and it is easier to read the structure of the string
because the indentation is the same as the initial line!",
};
public static string Trim(string str) => _dic[str];
}
In practice though if performance was really important, you would just write your strings in the slightly uglier fashion, and skip all of this :-). Most of the time this would be used in code where having to format on the first use, and then use a concurrent dictionary from then on is an acceptable cost.
BTW none of the above is to say this is a bad idea - it's a very reasonable suggestion. It's just it's best to explore all existing avenues before choosing to add a new language feature 😄
How about we just discard the first line in a @|
string?
...
var myText = @|" // this is not a part of the text
|this is text
|and so is this
|and this"
...
or use the statements indentation and not the strings?
...
var myText = @" // also not text. This whole issue exists because of the extra `"` after the `|`
this is text
and so is this
and this"
...
Java's text blocks trim out the left margin but they determine it based on the whitespace around the declaration:
var withoutWhitespace = """
<html>
<body>
<p>Hello World</p>
</body>
</html>
""";
var withWhitespace = """
<html>
<body>
<p>Hello World</p>
</body>
</html>
""";
@Flutterish I think statement indentation is somewhat flawed because you cant distinguish normal code from the second line of such a string when not using a color pattern from an ide (e.g. when you inspect a file with Notepad or on a website without support of colored strings.
What I dislike about the // this is not part of the text
is that it's orthogonally to how strings adn "
behave at the moment.
I think it would be good for this feature if changing @
to @|
in a codebase would not break the behaviour that a string starts with the first character after the "
.
...
var myText = @|" // this is not a part of the text, but some devs would expect it to be
|this is text
|and so is this
|and this"
...
@HaloFour thanks for the link. I think it adds a lot to the discussion.
I'm not sure what to think about that java feature - I like how clean it looks, and how easy it is to use, but the removed whitespace is non-explicit and non-intuitve at first, and especially hard to track by eye in strings spanning many lines.
The equivalent of this feature requests example using the java syntax would be
public string SomeTestString => """
this is a string,
its content is spread over multiple lines,
and it is easier to read the structure of the string
because the indentation is the same as the initial line!""";
I'm not sure what to think about that java feature - I like how clean it looks, and how easy it is to use, but the removed whitespace is non-explicit and non-intuitve at first, and especially hard to track by eye in strings spanning many lines.
The tooling experience does a good job of indicating the effective whitespace:
F# has backslash strings
that indent string contents by stripping leading spaces.
let poem =
"The lesser world was daubed\n\
By a colorist of modest skill\n\
A master limned you in the finest inks\n\
And with a fresh-cut quill."
also, have triple-quoted strings
, what allow us don't even have to escape
let tripleXml = """<book title="Paradise Lost">"""
See it running on sharplab
I liked the approach for backslash strings
and for triple-quoted strings
I would just suggest enclose it in C# by using the backtick (`) (grave accent) character instead of triple double quotes for convenience.
Couldn't we, in theory simply update based on the column for the start of the string? For example:
public string testString = @|"This is an verbatim string that infers indentation based on the column of the start of the string.
This is line 2.
This is line 3.";
The first character of line one in the string here is at column index (zero-based) 30. The second line also starts at column index 30, while the 3rd line begins at column index 34. In this case, if it is defined that adding the |
allows the compiler to infer indentation based on the column of the first character in the string, then it should work just fine. The biggest issue I see is maybe compile time performance.
@tacosontitan Tabs do not play nice with spaces, that is 4 spaces ( 4 chars, 4 columns wide ) is not the same as 1 tab ( 1 char, variable width )
@Flutterish couldn't we look at the user settings to determine if the user is utilizing tabs/spaces and the number of spaces each tab uses? I could swear both are environmental variables. Then I believe tabs show up as hidden characters \t? You could then convert to spaces as need based on those settings; but I could be entirely wrong here. Just trying to help 😀
@Flutterish couldn't we look at the user settings to determine if the user is utilizing tabs/spaces and the number of spaces each tab uses? I could swear both are environmental variables. Then I believe tabs show up as hidden characters \t? You could then convert to spaces as need based on those settings; but I could be entirely wrong here. Just trying to help 😀
In VS use of tabs or spaces for indentation is configured either by VS settings or .editorconfig
. As far as I'm aware compiler knows neither.
If compiler did what you're suggesting, the code would mean on my machine something else than it does on yours. C# was always whitespace insensitive and played well with both tabs and spaces. Changing it now would surely create friction.
There is a developer community request for a similar feature, except that the indentation is VIRTUAL (only shows in the IDE), and thus no need for a language change. https://developercommunity.visualstudio.com/idea/602807/indent-multi-line-verbatim-strings.html?childToView=1230559#comment-1230559
@jrmoreno1 the problem with a virtual feature in the IDE is that users using Visual Studio Code, JetBrains Rider, Vim, or even Notepad are left behind, which I think is sad (I personally havent used Visual Studio for a few months due to using Rider)
@Mafii: yes, that’s a downside. But that is the upside as well — if you look at the code as just text, you see what the compiler sees. If it is done by a language change, I would also expect IDE changes as well for optimum results and you wouldn’t have those in vim or notepad.
How about trimming all tabs but not spaces. You can ident the string start with tabs and the indentation inside the string with spaces or \t
. This way it would be not dependent on the IDE. Moreover Visual Studio or other editors could add a feature to start a newline of such a string with a specific amount of tabs. Win win for all.
var foo = """<!-- Test -->
____<html>
____ <head>
____ </head>
____</html>""";
Or
var foo = """<!-- Test -->
____<html>
____\t<head>
____\t</head>
____</html>""";
I used underscores to show actual tabs here.
I don't like the idea of having \t or something inside the string, that goes against the intention of this request to copy paste a string without having to modify it.
How about trimming all tabs but not spaces. You can ident the string start with tabs and the indentation inside the string with spaces or \t.
I think you've just managed to offend everyone who cares about the spaces-vs-tabs debate - which is, within a rounding error, 100% everyone I've worked with who has had to work across multiple languages.
C# has never been a whitespace sensitive language. There are any number of teams who apply automatic formatting to their code, either routinely eliminating tabs in favor of spaces, or vice versa. Some do this to ensure conformity - and some do it to permit variance (allowing their developers to automatically reformat code however they like, secure in the knowledge that the changes aren't inflicted upon anyone else).
Making the language whitespace sensitive - a breaking change - after 20 years of widespread use and literally hundreds of billions of lines of code is simply a non-starter.
I don't like the idea of having \t or something inside the string, that goes against the intention of this request to copy paste a string without having to modify it.
Some other ideas suggested the pipe operator in front of each line which would be even worse. Adding the \t after pasting is one replace operation which is done in 2 seconds or in a blink of an eye if you put it on a shortcut in your editor. This could be also done automatically by an IDE feature when pasting into a surrounding verbatim string.
How about trimming all tabs but not spaces. You can ident the string start with tabs and the indentation inside the string with spaces or \t.
I think you've just managed to offend everyone who cares about the spaces-vs-tabs debate - which is, within a rounding error, 100% everyone I've worked with who has had to work across multiple languages.
C# has never been a whitespace sensitive language. There are any number of teams who apply automatic formatting to their code, either routinely eliminating tabs in favor of spaces, or vice versa. Some do this to ensure conformity - and some do it to permit variance (allowing their developers to automatically reformat code however they like, secure in the knowledge that the changes aren't inflicted upon anyone else).
Making the language whitespace sensitive - a breaking change - after 20 years of widespread use and literally hundreds of billions of lines of code is simply a non-starter.
Maybe do it like with nullables. Add a switch to specify one of 3 modes:
#whitespace space
#whitespace tab
#whitespace both
The last mode is the default and will be as it is today. Verbatim strings with indentation won't work with this active.
The other modes force a specific whitespace and the other one is not permitted. This could also help teams by getting compile errors or errors while typing if they use wrong whitespaces to ident. The IDE can support this as well. As it is an optional feature it won't break any code.
I never liked it that I can exchange tabs and spaces to my liking. A switch to get rid of it would be nice regardless of verbatim strings. But I guess this would be for a different proposal.
Any change that makes C# whitespace-sensitive in this fashion will be voted against by me.
Add a switch to specify one of 3 modes:
We are definitely not adding dialects over something like string literals. It's not happening.
This could be solved by an IDE feature entirely, with no language change involved. I.e. the verbatim string would physically remain unindented, but the IDE would visually indent the continuation lines and indicate the effective whitespace with a vertical line. Viewed in Notepad the example from the Motivation section above would still look "cannibalized". Advantage: the feature would be immediately available in existing code.
I have proposed it here: Indent multi-line verbatim strings
There is also a proposal of multi-line indent in context of raw strings https://github.com/dotnet/csharplang/issues/4304
What about this use case?
[Fact(DisplayName = @"
As <persona>,
I want <what?>
So that <why?>.")]
public void Test() { }
It would be great to remove whitespace and replace line breaks with a single space. The result would be readable code, cleaner readable test output report on the cmd line, and cleaner and readable output in the IDE unit test tools panels.
The result would be readable code
This doesn't look readable to me (imo). That all looks like there woudl be newlines there. Having those newlines become spaces doesn't seen intuitive to me at all.
Let me clarify. I left a few things open to discussion as not to influence a direction on the approach just yet. The example listed earlier was to prove a use case.
As of today, the output of [Fact]
is as the attachment shows (with spaces and \n). However, it would be more ideal to be able to produce the output as As <persona>, I want <what?> So that <why?>.
The C# language does not have methods to parse/transform a multiline string. None as described in earlier comments. As in the example listed before, an attributes adds a layer of complexity that constraints to what we can do.
I think it would be easier to read and understand as:
[Fact(DisplayName = @"As <persona>, I want <what?> So that <why?>.")]
public void Test() { }
Why not allow for classic C style concatenation?
var someString = "This string will"
"be concatenated"
"at Compile Time.";
Now in C we would need to explicitly add the newline, but I see no reason to be bound by that convention here.
This also prevents making people like me who understand the one true way really annoyed.
I think it would be easier to read and understand as:
[Fact(DisplayName = @"As <persona>, I want <what?> So that <why?>.")] public void Test() { }
Preference is a highly debatable topic and a meaningless endeavor. Would this debate be any different on the topic of comments allowing single vs multi lined?
Options here is the key!
It would be nice to have, a margin trim with prefix & postfix. Expanding on the capabilities introduced by Kotlin TrimMargin
Couldn't this be accomplished by simply starting on the next line? You'll sacrifice two characters of placement for the first line, but everything else lines up:
private string _verbatim =
@"something
is awesome about this.";
The raw strings proposal which has been referenced a couple of times here already includes automatic trimming of the string literal, done by the compiler at compile-time.
@tacosontitan I disagree, my main reason for disliking verbatim strings is that if I use them in a code block, the block suddenly has 0 indentation instead of the same one the rest of the code has. It makes reading through the rest of the file harder as it seems like it's top level, but it isn't at all.
So your solution doesn't solve my problem at all. :)
As mentioned, the raw string literals proposal handles this case.
@tacosontitan I disagree, my main reason for disliking verbatim strings is that if I use them in a code block, the block suddenly has 0 indentation instead of the same one the rest of the code has. It makes reading through the rest of the file harder as it seems like it's top level, but it isn't at all.
So your solution doesn't solve my problem at all. :)
Not trying to be argumentative at all, but if the verbatim string is distracting, you can use region directives to show and hide it as needed:
#region Verbatim String
private string _verbatim =
@"verbatim
verbatim
something
else";
#endregion
However, I concur that this would be a nice feature, and I agree with the linked raw strings proposal 😄
Can this be closed now that raw strings literals have been introduced? They solve the indentation problem.
Indented Verbatim String
Summary
Motivation
When writing multiline strings, I have often come to a point where I want to indent the text on the newline, so that it doesn't cannibalize then indentation of the class.
Example of cannibalized indentation with verbatim strings:
We can fix the indentation with spaces in the string, but this has a huge downside: the string will contain spaces for no reason!
Detailed design
I propose the use of the pipe (
|
) as the identifier of a indented (verbatim) string, when used in combination with the@
, the identifier for a verbatim string.Thus, a empty indented verbatim string would look like that:
Just like the
$
sign for interpolated strings can be used with@
, the pipe can be used in combination with@
(required) and$
(optional). The|
may not start the group of identifiers in front of the string. This means that while@|""
is valid,|@""
is not. This makes it easier to avoid problems with the binary operator|
, that can be valid in different contexts. While any ordering for $ and @ are valid, as they can be used on their own, it makes more sense to see @| as the only valid order, as they only work together, and | can not exist on it's own. The proposed feature name (Indented Verbatim String) therefore contains the meaning of both keywords.Inside a indented verbatim string, we can use the
|
to set indentation of the line, resulting in this:The compiler rewrites the code to the equivalent to:
If a developer wants to use the pipe (
|
) as a character of the string, he can escape it with\|
or||
(up to discussion).Drawbacks
Problem 1: Tabs / Whitespace fights are problematic: IF we require
|
to be on the same indentation on every line inside the string, how do we handle a mixture of tabs and whitespaces? (Probably just disallow that? Devs mixing tabs and whitespaces (for indentation and alignment) would probably be disappointed by this, but there is no clear way of doing it otherwise)Problem 2: The behaviour of spaces after a pipe are not clear when reading the code. Do they introduce new whitespace?
e.g.:
will the difference be clear to
Problem 3: Do we encourage bad newline design here? Wouldn't
Environment.Newline
be better instead of having the newline character(s) of the code-base?e.g.:
is that the better solution to the multiline problem, and should this proposal be avoided so that developers aren't encouraged to ignore the platform dependent newline concern?
Alternatives
Of course, other characters could be used to indent the string. I don't know which one would fit better.
When using a Interpolated Verbatim string, brackets and newlines can be abused to achieve the same, see Problem 3 in Drawbacks
Unresolved questions
Unresolved question 1: Do we support different indentations of | on subsequent lines in a Indented Verbatim?
According to the proposal, the following code snipped would be valid:
But is this code snipped valid, too?
And this?
And this?
Unresolved question 2: How do we escape the
|
character in the string?Option 1:
@|"\|"
is equivalent to@"|"
(similar to\\
,\t
,\n
, etc.)Option 2:
@|"||"
is equivalent to@"|"
(similar to{{
,}}
and""
)I think option 1 is more intuitive.