albertodemichelis / squirrel

Official repository for the programming language Squirrel
http://www.squirrel-lang.org
MIT License
923 stars 160 forks source link

Floatingpoint numeric parsing is broken if the locale is not english #42

Open bluehazzard opened 8 years ago

bluehazzard commented 8 years ago

Hi, on Linux if I use a locale with ',' as decimal point separator the parsing of the floating points in non string expressions is broken, because _fvalue = (SQFloat)scstrtod(&_longstr[0],&sTemp); returns a wrong value.

For example: The squirrel code printf("%f",12.34) will output: 12 scstrtod returns 12 because it expects a ',' as decimal separator...

A solution would be to manually parse the floating point number if it is not in a string literal...

zeromus commented 8 years ago

Looks like what lua might do is:

#include <locale.h>
case TFLOAT:
    for(int i=0;i<_longstr.size();i++)
        if(_longstr[i] == '.')
            _longstr[i] = localeconv()->decimal_point[0];
    _fvalue = (SQFloat)scstrtod(&_longstr[0],&sTemp);

Another idea would be to wrap scstrtod with a function that is capable of handling this (in this same way generally), only on platforms where it's needed. Then on some other platforms you could pass the invariant locale into a proprietary variant function, if it exists.

is this a problem in sqbaselib's str2num() as well?

bluehazzard commented 8 years ago

_longstr[i] = localeconv()->decimal_point[0];

i don't think this is a good idea, because for example in arabic the decimal point is (or can be, depending on the code page) a multi byte symbol [1]. But i am not sure if this is a problem...

some other thoughts from the luajit author:

[...] \ Rationale for the builtin string to number conversion library: * It removes a dependency on libc's strtod(), which is a true portability * nightmare. Mainly due to the plethora of supported OS and toolchain * combinations. Sadly, the various implementations * a) are often buggy, incomplete (no hex floats) and/or imprecise, * b) sometimes crash or hang on certain inputs, * c) return non-standard NaNs that need to be filtered out, and * d) fail if the locale-specific decimal separator is not a dot, * which can only be fixed with atrocious workarounds. * Also, most of the strtod() implementations are hopelessly bloated, * which is not just an I-cache hog, but a problem for static linkage \ on embedded systems, too. [,,,]

This is quoted from his own implementation of strtod found in the _ljstrscan.c from his git repo (http://luajit.org/download.html)

is this a problem in sqbaselib's str2num() as well?

to make things more clear: this line _fvalue = (SQFloat)scstrtod(&_longstr[0],&sTemp); is from sqlexer.cpp line 460 so it is a compiler/lexer problem not a implementation problem of functions for example print(format("%f",12.34)) returns 12.0000000

print("12.34".tofloat()) returns 12

interestingly i can't reproduce this errors with the binary provided with squirrel. But it should not work, as also the lua implementation has a workaround. There seems to be a mess up with the locale settings (one point for the luajit implementation)...

i will investigate future

[1] https://en.wikipedia.org/wiki/Decimal_mark

greetings

bluehazzard commented 8 years ago

I know now why the provided binary of squirrel ("sq") works. The internal locale is "C". I don't know why it is using this locale, because LANG is set to "de_DE.UTF-8"

anyway it is a bad bug in the squirrel lexer...

zeromus commented 8 years ago

I think it isn't proper to have a lexer depending on someone else's flaky numerical lexer. If luajit has a better function, we should use that. I propose this rule: whatever a prominent and well-tested lua does for parsing numbers is the best choice for squirrel. Can you think of any exceptions to this rule? I think I will try to integrate luajit's strtod replacement within a few days if I don't hear anything else.

mingodad commented 8 years ago

I agree with you .

albertodemichelis commented 8 years ago

I think luajit strtod is unnecessarily complicated, I'd rather find something simpler or write one.

zeromus commented 8 years ago

Check this out

https://github.com/mono/mono/blob/master/mcs/class/referencesource/mscorlib/system/number.cs ParseNumber().

https://github.com/mono/mono/blob/master/mono/metadata/number-ms.c mono_double_from_number() and number_to_double()

(all MIT licensed)

You can strip some options out of the c# code and it ends up being pretty compact, IMO. Then if theres any bugs in a new implementation for squirrel, we can check it against .net as a reference.

breakin commented 7 years ago

Was this ever fixed? I don't seem to have this issue right now but as a person living in a decimal-comma-country I've suffered the pain before.

Here is a commit in another open source project where they fixed the same issue: https://github.com/imageworks/OpenShadingLanguage/commit/0f339174f24f4766b869dc30b24bfb83fa6a5dfd https://groups.google.com/forum/#!topic/osl-dev/OB4N4Rpdobs

zeromus commented 7 years ago

This was never fixed. I still suggest we use one of the two I suggested. I'll volunteer to do the work if @albertodemichelis will okay the concept

albertodemichelis commented 7 years ago

I written a float parser for this, I just need to find it :) and test it.

albertodemichelis commented 7 years ago

I've put together this a while ago.

#include <math.h>
#include <assert.h>
double parsefloat(const char *s)
{
    double sign = 1;
    double res = 0;
    double scale = 0.0;
    const char *x = s;
    char c;
    if (*x == '-')
    {
        sign = -1;
        x++;
    }
    while (c = *x) { //until exponent
        if (c >= '0' && c <= '9')
        {
            res = res * 10 + (c - '0');
            scale *= 10;
        }
        else if(c == '.') {
            scale = 1;
        }
        else {
            break;
        }
        x++;
    }
    bool hasexp = false;
    double esign = 1;
    double exp = 0.0;
    int nexp = 0; //number of digits in exp
    if (c == 'e' || c == 'E')
    {
        x++;
        c = *x;
        hasexp = true;

        if (c == '+')
        {
            esign = 1;
            x++;
        }
        else if (c == '-')
        {
            esign = -1;
            x++;
        }

        while (c = *x) { //until end
            if (c >= '0' && c <= '9')
            {
                exp = exp * 10 + (c - '0');
                nexp++;
            }
            else {
                break;
            }
            x++;
        }
    }

    if (scale != 0) {
        res *= 1.0 / scale;
    }

    if (hasexp)
    {
        if (nexp == 0) {
            //error;
        }
        exp *= esign;
        res = res * pow(10, exp);
    }
    res *= sign;
    return res;
}

#define TEST_PARSE_DOUBLE(val) {assert(parsefloat(#val) == val);}
int main(int argc, char* argv[])
{
    TEST_PARSE_DOUBLE(0.5);
    TEST_PARSE_DOUBLE(9.99);
    TEST_PARSE_DOUBLE(12);
    TEST_PARSE_DOUBLE(12.0);
    TEST_PARSE_DOUBLE(12.0e+10);
    TEST_PARSE_DOUBLE(-122131242342.0e-10);
    TEST_PARSE_DOUBLE(.5);
    TEST_PARSE_DOUBLE(122138760e-2);
    return 0;
}
MrBrrown commented 2 months ago

Hi, I also encountered this problem and this solved it.

#include <locale>
#include <clocale>

struct DecimalFixer {
    DecimalFixer() {
        std::locale::global(std::locale(std::locale(), new fixed_decimal_point));
        setlocale(LC_NUMERIC, "C");
    }
private:
    struct fixed_decimal_point : public std::numpunct<char> {
        char_type do_decimal_point() const { return '.'; }
    };
};

static DecimalFixer  fixer; // Call this smth in lib code
zeromus commented 2 months ago

No, we need to integrate the handmade parser or find another. Depending on libc/c++ locale is not portable.

albertodemichelis commented 1 month ago

I'd integrate the handmade parser but I'm not an IEEE float expert, I'm not sure how correct that implementation is.