dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.28k stars 4.74k forks source link

[API Proposal]: Boolean format strings #67388

Open OronDF343 opened 2 years ago

OronDF343 commented 2 years ago

Background and motivation

Simply printing a boolean value will always result in True or False. In some cases, it may be desirable to print a different constant value based on a boolean value when using string interpolation. This could be made easier by introducing format strings for bool.

For example, here is a naive implementation without string interpolation:

public override string ToString()
{
    return "Id: " + Id + " - " + (IsEnabled ? "Enabled" : "Disabled");
}

Currently, there is no clean way to implement the above method with string interpolation. Examples:

// 1. Ternary expression inside
return $"Id: {Id} - {(IsEnabled ? "Enabled" : "Disabled")}";
// 2. Ternary expression outside
return IsEnabled ? $"Id: {Id} - Enabled" : $"Id: {Id} - Disabled";

Adding standard and custom format strings for bool can solve this convenience issue.

Standard format strings:

Custom format strings will use the ; section specifier to specify the value if true and the value if false, similarly to how it is used in custom numeric format strings. This allows format strings such as yes;no. Spaces are allowed. Support for the \ escape character is also required when either of the strings contains ;.

With these new format strings, System.Boolean will be able to implement IFormattable (and ISpanFormattable) to enable generic formatting in the same away as other primitive types.

Additionally, once static abstract members in interfaces are available, generic parsing can be made possible by implementing IParseable<T> (and ISpanParseable<T>). (Design document)

Note about performance: These format strings are a convenience feature for general use. A ternary expression will be faster in most scenarios since there is no need to parse a format string.

API Proposal

Formatting:

public partial readonly struct Boolean : IFormattable, ISpanFormattable
{
    // IFormattable member
    public string ToString(string? format, IFormatProvider? provider);
    // Convenience overload
    public string ToString(string? format);
    // ISpanFormattable member
    public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider provider);
    // Convenience overload
    public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format);
}

Parsing:

public partial readonly struct Boolean : IParseable<Boolean>, ISpanParseable<Boolean>
{
    // IParseable<Boolean> members
    public static Boolean Parse(string s, IFormatProvider? provider);
    public static bool TryParse(string s, IFormatProvider? provider, out Boolean result);
    // Convenience overloads
    public static Boolean Parse(string s);
    public static bool TryParse(string s, out Boolean result);

    // ISpanParseable<Boolean> members
    public static Boolean Parse(ReadOnlySpan<char> s, IFormatProvider? provider);
    public static bool TryParse(ReadOnlySpan<char> s, IFormatProvider? provider, out Boolean result);
    // Convenience overloads
    public static Boolean Parse(ReadOnlySpan<char> s);
    public static bool TryParse(ReadOnlySpan<char> s, out Boolean result);
}

API Usage

var id = 42;
var isEnabled = false;
// Prints "Id: 42 - Disabled"
Console.WriteLine($"Id: {id} - {isEnabled:Enabled;Disabled}");
// Prints "false"
Console.WriteLine(isEnabled.ToString("g"));

// b is true
var b = bool.Parse("Apple", "Apple;Orange");
// b2 is false
bool.TryParse("false", "g", out var b2);

Alternative Designs

G and g are the same letter which might be confusing. Also, if more than 2 casing rules are commonly needed, they are insufficient. Instead, the standard format strings could be assigned each to different letters:

Risks

This proposal adds even more magic format specifiers to remember. This can be addressed by #58329.

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

vcsjones commented 2 years ago

There was some prior discussion on boolean format strings at https://github.com/dotnet/runtime/issues/17141.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation Simply printing a boolean value will always result in `True` or `False`. In some cases, it may be desirable to print a different constant value based on a boolean value when using string interpolation. This could be made easier by introducing format strings for `bool`. For example, here is a naive implementation without string interpolation: ``` public override string ToString() { return "Id: " + Id + " - " + (IsEnabled ? "Enabled" : "Disabled"); } ``` Currently, there is no clean way to implement the above method with string interpolation. Examples: ``` // 1. Ternary expression inside return $"Id: {Id} - {(IsEnabled ? "Enabled" : "Disabled")}"; // 2. Ternary expression outside return IsEnabled ? $"Id: {Id} - Enabled" : $"Id: {Id} - Disabled"; ``` Adding standard and custom format strings for `bool` can solve this issue. Standard format strings: - `G`: The default behavior - `True`/`False` - `g`: Lowercase version of the default - `true`/`false` (Addresses #45450) Custom format strings will use the `;` section specifier to specify the value if `true` and the value if `false`, [similarly to how it is used in custom numeric format strings](https://docs.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--section-separator). This allows format strings such as `yes;no`. Spaces are allowed. Support for the `\` escape character is also required when either of the strings contains `;`. Performance of the new format strings can be more easily optimized than existing code, since as of C# 10 / .NET 6, interpolated strings are already parsed by the compiler and converted to use `DefaultInterpolatedStringHandler`. Therefore, as long as the format string is a compile-time constant, performance of the new format specifiers can be the same or better in most cases. Theoretically, `$"{value:g}"` could provide a performance gain in microbenchmarks over `$"{value.ToString().ToLowerInvariant()}"` (which is recommended by the docs in the remarks section [here](https://docs.microsoft.com/en-us/dotnet/api/system.boolean.tostring?view=net-6.0)). ### API Proposal ```C# public partial readonly struct Boolean { public string ToString(string? format); } ``` ### API Usage ```C# var id = 42; var isEnabled = false; // Prints "Id: 42 - Disabled" Console.WriteLine($"Id: {id} - {isEnabled:Enabled;Disabled}"); // Prints "false" Console.WriteLine(isEnabled.ToString("g")); ``` ### Alternative Designs Another overload which accepts `IFormatProvider?` should be considered for consistency with other primitives, even though it has no effect. ### Risks This proposal adds even more magic format specifiers to remember. This can be addressed by #58329.
Author: OronDF343
Assignees: -
Labels: `api-suggestion`, `area-System.Runtime`, `untriaged`
Milestone: -
stephentoub commented 2 years ago

Therefore, as long as the format string is a compile-time constant, performance of the new format specifiers can be the same or better in most cases.

In your example with {isEnabled:Enabled;Disabled}, the format specifier passed by the compiler to Append will be "Enabled;Disabled". That means Boolean.TryFormat would need to parse that for the semicolon. That is more expensive than if you'd written {(isEnabled ? "Enabled" : "Disabled")}, which is already possible today, almost as concise, and certainly more efficient.

OronDF343 commented 2 years ago

@stephentoub Noted, I assumed too much. I've removed that statement for now.

stephentoub commented 2 years ago

Theoretically, $"{value:g}" could provide a performance gain in microbenchmarks over $"{value.ToString().ToLowerInvariant()}"

It would be more efficient than that (I'm sad that's what the docs suggest, we should fix that) , but not more than {(value?"true":"false")}. I'd need to measure, but I expect my version that's already possible would be measurably better than {value:g}.

I can see a reasonable usability argument for adding the proposed G/g. I don't believe there's a reasonable performance argument to adding this.

tannergooding commented 2 years ago

I can see a reasonable usability argument for adding the proposed G/g. I don't believe there's a reasonable performance argument to adding this.

👍

This would mostly just be a thing of convenience and users needing perf should really just cond ? "trueString" : "falseString".

We should be able to update the docs as appropriate.

ghost commented 2 years ago

This issue has been marked needs-author-action since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.

tannergooding commented 2 years ago

I'm ok with taking a proposal to make this implement IFormattable + the new IParsable (and potentially ISpanFormattable, ISpanParsable) for consistency with the other primitive types.

I think that the proposal should be updated to fully cover those bits and the need/usage scenario around it first, however.

stephentoub commented 2 years ago

I'm also curious about the exact casing that would be supported. Three casings would seem to make some sense:

This proposal covers the first two. Is there ever a need for all uppercase, and if so, how does that play into this?

tannergooding commented 2 years ago

I don't think we have a mechanism to represent a third concept today. We basically just have G is all uppercase and g is all lowercase.

This also makes it interesting since that would mean G should, by "convention", return "TRUE" and not "True". -- I don't have a preference here or anything, just calling out the current behavior for other primitive types that I can remember.

OronDF343 commented 2 years ago

I have updated the proposal to reflect implementing formatting and parsing interfaces. I have also expanded upon the alternative design to address having more casing formats.

Basically, if necessary, we could make the standard format specifier different for each casing and make it case-insensitive (like many other standard format specifiers). G/g can be the default behavior, L/l can be lowercase, and U/u can be uppercase.

stephentoub commented 2 years ago

It turns out Utf8Formatter/Utf8Parser already recognizes 'l' as meaning "true"/"false". https://docs.microsoft.com/en-us/dotnet/api/system.buffers.text.utf8formatter.tryformat?view=net-6.0#system-buffers-text-utf8formatter-tryformat(system-boolean-system-span((system-byte))-system-int32@-system-buffers-standardformat) https://github.com/dotnet/runtime/blob/5c57f2c0cda44176e237574ceb51d659ef9915fa/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Boolean.cs#L46-L53 https://github.com/dotnet/runtime/blob/5c57f2c0cda44176e237574ceb51d659ef9915fa/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Boolean.cs#L22-L24 So adding an 'l' format to bool.TryFormat would at least be consistent with that.

TrayanZapryanov commented 2 years ago

XmlConvert also have parsing logic for true/false : https://github.com/dotnet/runtime/blob/5639f8985978c44d422216905a78704b780d2b79/src/libraries/System.Private.Xml/src/System/Xml/XmlConvert.cs#L762

stephentoub commented 2 years ago

XmlConvert also have parsing logic for true/false

We should discuss it, but I'm not sure we'd want to add parsing for numerical values to bool.

OronDF343 commented 2 years ago

I agree that hardcoding the values is best for performance.

Performance is certainly not the goal of this proposal, I was simply trying to compare with other options that make sense for general use, to show that for the average developer there is no significant performance cost.

A ternary expression is relatively concise, but has nested double quotes in it, which always worsened readability to me. Also, if I am not using string interpolation, and am instead passing generalized parameters to a composite format string defined elsewhere (i.e. a templating system of some sort), acheiveing the same result becomes more difficult.

On March 31, 2022 6:57:00 PM GMT+03:00, Stephen Toub @.***> wrote:

Theoretically, $"{value:g}" could provide a performance gain in microbenchmarks over $"{value.ToString().ToLowerInvariant()}"

It would be more efficient than that, but not more than {(value?"true":"false")}. I'd need to measure, but I expect my version that's already possible would be measurably better than {value:g}.

I can see a reasonable usability argument for adding the proposed G/g. I don't believe there's a reasonable performance argument to adding this.

-- Reply to this email directly or view it on GitHub: https://github.com/dotnet/runtime/issues/67388#issuecomment-1084781196 You are receiving this because you authored the thread.

Message ID: @.***> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.