crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.21k stars 1.61k forks source link

Non-zero padded exponent in float string representation after Ryu implementation #14682

Closed franciscoadasme closed 1 week ago

franciscoadasme commented 3 weeks ago

As discussed in the forum post of the same name, there was an unintended change in the scientific notation for single-digit exponents, where they are no longer zero-padded after PR #14084 included in v1.11:

# before
printf("%E", 123.45) # => 1.234500E+02
printf("%E", 123.45e15) # => 1.234500E+17
# after
printf("%E", 123.45) # => 1.234500E+2 (note the missing leading zero in the exponent)
printf("%E", 123.45e15) # => 1.234500E+17

This no longer follows the C99 standard, which most languages adhere to. Furthermore, the official and other Ryu implementations (used in #14084) also print zero-padded exponents. It allows nicely aligned numbers, which is useful when writing files with hundreds/thousands lines of floating numbers such as those used in computational chemistry; my field of work.

I ask to revert the format back to include the leading zero.

beta-ziliani commented 3 weeks ago

I mildly agree with reverting this change in formatting:

As mentioned in the forum, the original reason to drop the leading zero was to make it consistent with normal printing (e.g., 1e-6.to_s returns "1.0e-6").

So reverting it means breaking again the internal consistency, in favor of some external consistency and backward consistency.

I would like to point out, though, that the argument drawn was to ensure numbers will be parsed correctly. This sounds a bit sketchy. Because while indeed Python, Ruby, OCaml, and, of course, C, seems to agree on this formatting, .NET and Haskell doesn't. .NET pads the exponent with three digits. The C99 std says (bold is mine):

The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent.

Technically speaking, if a parser fails to parse a number without the "%2d" format of the exponent, it might as well break it .NET's "%3d".

Haskell does what Crystal 1.11 does.

ghci> Text.Printf.printf "%e\n" 1e-6
1.0e-6
Sija commented 3 weeks ago

So reverting it means breaking again the internal consistency, in favor of some external consistency and backward consistency.

@beta-ziliani I wouldn't call C99 spec some external consistency. It is a spec after all.

And so according to the spec part you've quoted, both the .NET and Haskell are simply wrong (and a minority among the languages). Following it means in this case reverting to the previous behaviour.

ysbaddaden commented 3 weeks ago

@Sija they're not wrong, they're free to not follow an external spec perfectly. Lang X printf doesn't have to be an exact C printf implementation.

ysbaddaden commented 2 weeks ago

That being said, the "at least two digits" rule likely didn't come from nowhere, and it may have some practical use, maybe just to improve readability, and maybe improve interoperability with other languages (the format is consistent).

beta-ziliani commented 2 weeks ago

Or maybe it's just historical?

franciscoadasme commented 2 weeks ago

Hey everyone, thank you for your input in this minor issue. I think the main problem is that #14084 introduced the change without documenting it, as @straight-shoota suggested in the forum post. From your discussion, it's not agreed which format should be used, so I think is better to revert it to the previous behavior for now. If it's later decided to drop the leading zero, it should be clearly documented on the PR/release notes.

IMHO, being consistent with the C99 standard and most other languages is important to avoid surprises in a heterogeneous environment (multiple languages), especially in scientific software (I believe Crystal is an excellent language for this use case). I think there should a very strong reason to go against a widely-used spec. The issue of internal consistency may be resolved by following this standard instead when using scientific notation in the normal printing of floats.