fastfloat / fast_float

Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12, Chromium and WebKit/Safari
Apache License 2.0
1.35k stars 124 forks source link

result modified when result_out_of_range #261

Closed igagis closed 2 weeks ago

igagis commented 3 weeks ago

the code

std::string_view str = "5.47382e-48";

float value = 100.0f;

auto res = fast_float::from_chars(str.data(), str.data() + str.size(), value, fast_float::general);

std::cout << "value = " << value << std::endl; // prints "value = 0"

assert(res.ptr == str.data() + str.size()); // assertion succeeds

assert(res.ec != std::errc::result_out_of_range); // assertion fails

actual result

expected result

justification

According to https://en.cppreference.com/w/cpp/utility/from_chars

1)

In any case, the resulting value is one of at most two floating-point values closest to the value of the string matching the pattern, after rounding according to std::round_to_nearest.

2)

On success, returns a value of type std::from_chars_result such that ptr points at the first character not matching the pattern, or has the value equal to last if all characters match and ec is value-initialized.

3)

If the pattern was matched, but the parsed value is not in the range representable by the type of value, returns value of type std::from_chars_result such that ec equals std::errc::result_out_of_range and ptr points at the first character not matching the pattern. value is unmodified.

So, from (1) and (2) we can say that the actual value of the string falls between 0 and smallest positive number which float can represent, perhaps closest one is 0. So, the output value should be set to 0 and error code value-initialized.

Even if the actual value of the string is considered to be out of range representable by float (which I think it is not) then the value should be unmodified, according to (3), and error code is result_out_of_range. But in this test the value is set to 0.

version

Found in version: 6.4.1 Last version where it worked as expected was: 3.11.0

lemire commented 3 weeks ago

So, from (1) and (2) we can say that the actual value of the string falls between 0 and smallest positive number which float can represent, perhaps closest one is 0. So, the output value should be set to 0 and error code value-initialized.

I agree with you and it is how I coded it initially, but people disagreed so this was changed, see...

https://github.com/fastfloat/fast_float/issues/120

I suggest you take up this issue with the C++ standardization people.

then the value should be unmodified

Ah. Let us check.

lemire commented 2 weeks ago

I just compiled and ran the following under Visual Studio 2022:

#include <iostream>
#include <charconv>

using namespace std;

int main()
{

    double result = -1;
    std::string str = "3e-1000";
    auto [ptr, ec] = std::from_chars(str.data(), str.data() + str.size(), result);
    cout << (ec == std::errc::result_out_of_range) << endl;
    cout << result << endl;
    return 0;
}

It prints 1 and 0.

image

image

lemire commented 2 weeks ago

Indeed, Microsoft has the same interpretation:

https://github.com/microsoft/STL/blob/62205ab155d093e71dd9588a78f02c5396c3c14b/tests/std/tests/P0067R5_charconv/test.cpp#L943-L946

cc @jwakely

lemire commented 2 weeks ago

We have documented the behaviour. I think that the library is correct. I am closing this issue.

igagis commented 2 weeks ago

Indeed, Microsoft has the same interpretation:

https://github.com/microsoft/STL/blob/62205ab155d093e71dd9588a78f02c5396c3c14b/tests/std/tests/P0067R5_charconv/test.cpp#L943-L946

This is strange, the out-of-range stuff is not that obvious and debatable, but come on, the value should remain unmodified if out-of-range is returned, this is something very obvious from the standard. (see my quote (3) from original post) So, Microsoft is doing it wrong, why does fast_float have to follow that wrong behavior?

Also, on the out-of-range. I checked the #120 ... and I can agree that 1e+1000 is out of float range, let's forget about infinity, as it is something like not a number. But 1e-1000 is clearly in float range. float range is approximately -3.4E+38 to 3.4E+38. Everything which falls between those boundaries is in float range. And 1e-1000 is in the float range. I can see that the argument is that 1e-1000 cannot be precisely represented by float type, but the standard is pretty clear about that:

the resulting value is one of at most two floating-point values closest to the value of the string matching the pattern, after rounding according to std::round_to_nearest

Let's also consider number 0.3, can it be precisely represented by float? Is it out-of-range?

Or, are we excluding 0 from the range? Why?

I suggest you take up this issue with the C++ standardization people.

Unfortunately, I have no time for that and see no much reason. In my opinion, the standard is written clear enough to make the conclusions. I'll just use the older version of fast_float for now.

So, if fast_float is intended as a drop-in replacement for std::from_chars I think it should follow the standard.

igagis commented 2 weeks ago

By the way, here is the quote from the README file of the library:

The resulting floating-point value is the closest floating-point values (using either float or double), using the "round to even" convention for values that would otherwise fall right in-between two values. That is, we provide exact parsing according to the IEEE standard.

lemire commented 2 weeks ago

@igagis

But 1e-1000 is clearly in float range.

The C language disagrees. Please run the following.

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>

int main(void)
{
    const char *p = "1e-1000 1e1000";
    printf("Parsing '%s':\n", p);
    char *end;
    for (double f = strtod(p, &end); p != end; f = strtod(p, &end))
    {
        printf("'%.*s' -> ", (int)(end-p), p);
        p = end;
        if (errno == ERANGE){
            printf("range error, got ");
            errno = 0;
        }
        printf("%f\n", f);
    }

}
igagis commented 2 weeks ago

I see. Looks like latest GCC has from_chars(float) support:

https://godbolt.org/z/M4bfbG7rd

As you can see it returns out-of-range as well, but does not modify the value. So, at least this one does not work according to standard.

igagis commented 2 weeks ago

At the same time, latest MSVC gives the other result, it modifies the value:

https://godbolt.org/z/63h84Tz46

lemire commented 2 weeks ago

@igagis

As you can see it returns out-of-range as well, but does not modify the value. So, at least this one does not work according to standard.

I am not sure what you mean by "So, at least this one does not work according to standard."

GCC uses fast_float under the hood. See the issue https://github.com/fastfloat/fast_float/issues/120 which I pointed to you. It is from the GCC folks.

What they do is catch the out of range errors and make sure that value is left unmodified.

As you have observed, that's not what Microsoft does.

I document carefully the fast_float stance in the README:

Screenshot 2024-08-26 at 10 49 51 AM

I think that the GCC stance is problematic. You can't distinguish 1e1000 and 1e-1000, both will leave the value unmodified and will set an out of range error. You have lost information. That is why I prefer the @jwakely's interpretation.

Please read what he wrote at https://cplusplus.github.io/LWG/issue3081

igagis commented 2 weeks ago

Ok, I see your points. My biggest misunderstanding here is why very small values are considered out-of-range. But, if there is a way to workaround this, I'm fine with it.

The problem is that due to those differences in behavior there is no such a way.

I take as given the statement that fast_float is a drop-in replacement for std::from_chars(float|double). If that is not true then I cannot complain. But let's think it is. Then, I'd expect that GCC and MSVC would behave identical, because otherwise my program using std::from_chars will not be compiler-independent. I will have to add #if bla bla bla preprocessor directives.

So far, I see that there is a standard, saying nothing about float range, but saying that value should not be modified if out-of-range is returned. GCC follows that, MSVC does not. Behavior is different.

I think that the GCC stance is problematic. You can't distinguish 1e1000 and 1e-1000, both will leave the value unmodified and will set an out of range error. You have lost information.

I agree with you, that modifying the value gives more information. And I see good intention to return more information, but it contradicts to the standard, and so we already have two different implementations. This means no compiler-independent programs...

Following the standard regarding non-modification of value and assuming 1e-1000 is out of float range means no way to detect if the out-of-range was due to too small or too big. Problem again.

So, we have two options: 1) To follow the standard => loose information, be consistent 2) To not follow the standard => preserve information, be inconsistent

none of these is better than the other. The best would be to treat 1e-1000 in range. But for some reason (strtod() precedent?) we have this insanity... why do we have to follow the precedent and take the suffer?

I understand that nothing will be done about this all, just ranting.

igagis commented 2 weeks ago

Also, just some background for the problem I faced. I'm parsing a user-supplied SVG file which have a number 5.47382e-48 as one of the primitive's coordinates. I see no reason why I should not just treat it as 0. I'm using from_chars(float) overload for parsing. Of course it fails with out-of-range. Now, using fast_float I can check if value is 0, then just return 0 and don't rise an error. But my plan is to later move to using std::from_chars(float) when it is widely available in std lib implementations. So, as we see GCC behaves differently and my code will not work anymore. Moreover, GCC's way (and standard) doesn't let me to check for 0, so the SVG parsing will be failing for sure. I see the initial cause of this issue is that 1e-1000 is treated as out of range for no reason.

lemire commented 2 weeks ago

@igagis I disagree with GCC and I agree that you should be getting zero when parsing 5.47382e-48.

lemire commented 2 weeks ago

Of course it fails with out-of-range.

This is a matter of interpretation. The fast_float library does not fail. It parses the value you expect and sets a flag indicating that the value is out of range, just like the C functions do.

Out of range just means that the value parsed is zero or +/- infinity.

If you are fine with out of range when it parses to zero, just add one-line

if(res.ec == std::errc::result_out_of_range && value == 0) { res.ec = std::errc(); }

C has worked this way for decades now. It is fine.

So fast_float is not a problem.

GCC is going to give you trouble, not fast_float.

igagis commented 2 weeks ago

GCC is going to give you trouble, not fast_float.

I would even say, it is the standard in combination with the strtod-precedent. GCC just follows the standard :).

lemire commented 2 weeks ago

@igagis

the strtod-precedent. GCC just follows the standard :).

The strtod approach is perfectly functional so I would not blame it. It works. It allows you to do whatever you need to do.

The from_chars approach was supposed to be a modernized version of the stdto* functions. It should not be less powerful.