dotnet / dotnet-api-docs

.NET API reference documentation (.NET 5+, .NET Core, .NET Framework)
https://docs.microsoft.com/dotnet/api/
Other
739 stars 1.57k forks source link

Double and float round-trip string format confusion #9320

Open ptasev opened 1 year ago

ptasev commented 1 year ago

The following article mentions that instead of using the round-trip specifier R for double and float, G17 and G9 should be used respectively. It also mentions the R specifier may be slower than G17. However, according to a blog post about .NET Core 3.1, this has since been fixed. Can we get some clarification on whether G17 is still the recommendation?

Docs mentioning not to use R: https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings#round-trip-format-specifier-r https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings#standard-format-specifiers

Blog post: https://devblogs.microsoft.com/dotnet/floating-point-parsing-and-formatting-improvements-in-net-core-3-0/#making-the-parser-ieee-compliant

krwq commented 1 year ago

cc: @tannergooding

tannergooding commented 1 year ago

ToString() in .NET Framework did not guarantee roundtripping, it was basically equivalent to G15. ToString("R") was more expensive because it first did G15, then parsed the result and compared to the original and if it failed, did the equivalent of returning G17. It basically just tried to get a string that would roundtrip and didn't care about length. This made it more expensive and thus G17 was a faster way to get a string that would definitively roundtrip -- There was however an issue in that the .NET Framework code also has some minor bugs and can still fail to produce roundtrippable values in some cases. It likewise treats higher precisions (like G20) the same as G17.

In .NET Core 3.0+, ToString() and ToString("R") are equivalent. They both return the "shortest roundtrippable string". They are both the fastest way to get a roundtrippable string. We also support precisions up to G99 which is more than sufficient to get the full underlying value of most inputs. However, some values can have more precision (the most significant digits is 767 for double) and specifying higher precisions could lead to weird results (specifying ToString("G999") would return the literal "G999", as it got treated as a custom numeric format string).

In .NET 6, we upped the maximum specifable precision to int.MaxValue and caused it to throw for anything larger (ToString("G4000000000") throws and doesn't return "G4000000000"). This then allowed the full underlying value for float/double and other cases to be printed.

We then restricted that to no more than 999,999,999 in .NET 7 to avoid issues with accidental overflow if a user specified a larger value and because that is far beyond what any reasonable computer could support for a number.

ptasev commented 1 year ago

Sounds like for the sake of serializing data the R specifier should do the trick. I'm curious what use cases would require more precision than the shortest roundtrippable string.

WizardBrony commented 5 months ago

@tannergooding Even though ToString() now has (in my opinion) more intuitive behavior by returning a round-trippable string by default, it was technically a breaking change at .NET Core 3.0. Additionally, as @ptasev pointed out, the documentation still recommends G17 as the format for round-tripping doubles. With that in mind, say I wanted to write code "once" that depends on round-trip functionality. From an API perspective, which format guarantees me that functionality for all future versions of .NET?

tannergooding commented 5 months ago

With that in mind, say I wanted to write code "once" that depends on round-trip functionality. From an API perspective, which format guarantees me that functionality for all future versions of .NET?

As per the above, .NET Framework has bugs which cannot be fixed due to backwards compatibility concerns and there is no format that guarantees roundtripping under all scenarios.

If you need to support .NET Framework, then ToString("G17") is the closest you can get to correct without rolling your own formatting algorithm. If you only need to support .NET Core, then simply using ToString("R") is sufficient and will ensure a roundtrippable result with the best performance.

WizardBrony commented 5 months ago

If you only need to support .NET Core, then simply using ToString("R") is sufficient and will ensure a roundtrippable result with the best performance.

Sounds great, thank you for confirming. Are there plans to update the official documentation to use R instead of G17 for round-tripping at .NET Core 3.0+, or is there a reason for keeping it worded as-is?

tannergooding commented 5 months ago

I don't believe there's explicit plans to update, but there's also no reason to keep it as is and contributions are welcome.

It really would just need a callout of "On .NET Framework" vs "On .NET Core". We have such callouts elsewhere in the doc such as for the precision specifier

When precision specifier controls the number of fractional digits in the result string, the result string reflects a number that is rounded to a representable result nearest to the infinitely precise result. If there are two equally near representable results:

  • On .NET Framework and .NET Core up to .NET Core 2.0, the runtime selects the result with the greater least significant digit (that is, using MidpointRounding.AwayFromZero).
  • On .NET Core 2.1 and later, the runtime selects the result with an even least significant digit (that is, using MidpointRounding.ToEven).
WizardBrony commented 5 months ago

I don't believe there's explicit plans to update, but there's also no reason to keep it as is and contributions are welcome.

Got it. Thanks again.