fmtlib / fmt

A modern formatting library
https://fmt.dev
Other
20.89k stars 2.52k forks source link

Default floating-point formatting does not produce shortest outputs; mismatch with `std::format` #3649

Open jk-jeon opened 1 year ago

jk-jeon commented 1 year ago

As far as I understand, the default formatting option should produce the shortest output, not just in the number of significand digits, but also in the number of actual characters. At least that seems to be how std::format is specified, according to the std::to_chars specifications.

However, it seems currently fmt picks the fixed-point format whenever the exponent is between -4 and 16, regardless of the number of characters it will produce: https://github.com/fmtlib/fmt/blob/3baaa8d899ced2f9ded80a3f142efd41808730e3/include/fmt/format.h#L2644

Is this an intended divergence? Or maybe I misunderstood how std::format is specified?

For what it's worth, it seems MS STL implementation of std::format does what I described.

vitaut commented 1 year ago

fmt::format is modeled after Python's str.format where shortest refers to the precision, not the full output. std::format diverged a bit because it was specified in terms of to_chars.

jk-jeon commented 1 year ago

I honestly feel like the shortest string is what people may expect, but that's of course just a subjective opinion. If you are going to change the behavior (or accept a PR that does so) in the future, it would be great. If not, please feel free to close this, but I think this difference needs to be documented anyway in places like https://fmt.dev/dev/api.html#compatibility-with-c-20-std-format.

vitaut commented 1 year ago

I am open to PRs to address this backed by more analysis of the effects of the change and concrete examples.

scurest commented 9 months ago

Note that this also results in the rather surprising (to me) behavior that eg 123456792.0f formats as "123456790", the last digit apparently being wrong. But these roundtrip to the same float and 123456790 is shorter in the sense of having fewer sigfigs.

std::to_chars formats it as 123456792.

vitaut commented 9 months ago

This is unrelated and I am surprised that to_chars produces "garbage" digits in this case.

jessey-git commented 9 months ago

Why is that "garbage" in this case? That value is perfectly representable as a float. Here's a nicely formatted sweep of some values for example: https://godbolt.org/z/a3Y8r1v6K

Is there a way to control the number of digits that rounds in this particular case, and without exponential notation, or should this be filed as another issue altogether?

vitaut commented 9 months ago

That's the term they used in Grisu paper. You can control precision, so there is no issue here.

jk-jeon commented 9 months ago

So this seems to be because std::to_chars is specified in terms of the number of characters, not the number of decimal digits. 123456784 and 123456780 are both of the shortest length, but the former is closer to the true value, so the implementation faithfully following the std spec must print the former.

So... this is interesting... we may need to look at what std::to_chars implementers have done if we ever want this behavior to be implemented in fmt.

EDIT: Here is the relevant code from microsoft/STL:

https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xcharconv_ryu.h#L1368 https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xcharconv_ryu.h#L1406