Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.91k stars 542 forks source link

netlib dtoa.c #14019

Open p5pRT opened 10 years ago

p5pRT commented 10 years ago

Migrated from rt.perl.org#122482 (status was 'new')

Searchable as RT122482$

p5pRT commented 10 years ago

From @jhi

As discussed elsewhere [1] there is a well-known solution for ascii-to-double (aka strtod) and double-to-ascii (aka printf\, or Gconvert) conversions​: the netlib dtoa.c [2]. This code is apparently used by Python\, PHP\, Java\, Firefox\, Chrome\, Safari

I suggest looking into integrating this into Perl. I volunteer myself to doing some of the work.

Pros​: - well tested and widely used fp/a conversion - consistent handling of fp/a conversions across platforms - consistent handling of inf/nan especially - hexadecimal floats (it's a C99 feature\, and even then seemingly inconsistently implemented) - Python\, PHP\, and Java compatibility (har har)

Cons​: - new code to maintain​: the netlib code (disregarding the copyright at the top) is still actively maintained\, so updates do happen - new code to include​: ~4400 source lines\, object code in Darwin ~37K - does memory management of its own (since it's hard to know exactly how long a string to allocate for dtoa) - has locale code in it ("1.23" -> double\, duh)\, this may require rather complete ripping out / replacing due to our   rather extensive locale-handling code

Unknowns/musings as of now​: - is the license compatible for us (given the wide range of users\, I would be rather surprised if there are problems) - does dtoa.c work with long doubles - while dtoa is *for implementing* printf\, how exactly does that work (we have a string... now how do we do %10.3f ?) - some platforms might have special quirks (I'm especially thinking nan/inf handling) that mean the dtoa.c cannot be used and the native strtod/printf facilities need still be used (though we could try backporting the work to dtoa.c and make the world a better place)

[1] https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122219 ("support hexadecimal floats") [2] http​://www.netlib.org/fp/dtoa.c

p5pRT commented 10 years ago

From @jhi

Initial investigation comments​:

- turning off the private memory management - using plain malloc/free for now (which is a different issue from the private memory management) - turning off the locale support for now (it used\, by accident\, the same USE_LOCALE define as Perl...) in Perl code\, the thing is GROK_NUMERIC_RADIX - there's also code for multiple threads (locking certain things)

More importantly\, looks like integrating this will be even more ... intense than expected​: the Perl_my_atof and Perl_my_atof2 (and the helper\, S_mulexp10) make for interesting reading\, especially with the VAX (and Cray) specifics.

p5pRT commented 10 years ago

From [Unknown Contact. See original ticket]

Initial investigation comments​:

- turning off the private memory management - using plain malloc/free for now (which is a different issue from the private memory management) - turning off the locale support for now (it used\, by accident\, the same USE_LOCALE define as Perl...) in Perl code\, the thing is GROK_NUMERIC_RADIX - there's also code for multiple threads (locking certain things)

More importantly\, looks like integrating this will be even more ... intense than expected​: the Perl_my_atof and Perl_my_atof2 (and the helper\, S_mulexp10) make for interesting reading\, especially with the VAX (and Cray) specifics.

p5pRT commented 10 years ago

From @jhi

More importantly\, looks like integrating this will be even more ... intense than expected​: the Perl_my_atof and Perl_my_atof2 (and the helper\, S_mulexp10) make for interesting reading\, especially with the VAX (and Cray) specifics.

Actually\, to be more explicit - looks like we are not actually currently even using strtod() as such! (Except for NaN/Inf conversion.) The above triad is the one taking care of the conversion. Should've remembered\, this all started at my tenure... maybe too long ago.

The whole area of code is a quilt of numeric overflow ballet\, strange cases in various places.

Summary​: I'm starting to doubt the benefit of bringing in the netlib strtod\, however well tested and widely used it is\, the current code has been well tested in the platforms Perl runs on.

And since we are not really depending on the system strtod​:s anyway (except for nan/inf)\, it looks like for the hexadecimal fp "strtod-ing" it would be better just to implement our own. This would not\, however\, solve the hexadecimal fp output.

p5pRT commented 10 years ago

From [Unknown Contact. See original ticket]

More importantly\, looks like integrating this will be even more ... intense than expected​: the Perl_my_atof and Perl_my_atof2 (and the helper\, S_mulexp10) make for interesting reading\, especially with the VAX (and Cray) specifics.

Actually\, to be more explicit - looks like we are not actually currently even using strtod() as such! (Except for NaN/Inf conversion.) The above triad is the one taking care of the conversion. Should've remembered\, this all started at my tenure... maybe too long ago.

The whole area of code is a quilt of numeric overflow ballet\, strange cases in various places.

Summary​: I'm starting to doubt the benefit of bringing in the netlib strtod\, however well tested and widely used it is\, the current code has been well tested in the platforms Perl runs on.

And since we are not really depending on the system strtod​:s anyway (except for nan/inf)\, it looks like for the hexadecimal fp "strtod-ing" it would be better just to implement our own. This would not\, however\, solve the hexadecimal fp output.

p5pRT commented 10 years ago

From @jhi

Then again\, dtoa.c is not without is share of security problems\, see e.g. https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2009-0689 (with many links to follow)

I also got a pointer to another implementation​:

http​://git.musl-libc.org/cgit/musl/tree/src/internal/floatscan.c

Much more minimal (good)\, and does seemingly 80-bit long doubles (good)\, but while musl has in general good quality and reputation (good)\, how well has this been tested for the edge cases\, and there are more IEEE (or -ish) long double types.

p5pRT commented 10 years ago

From [Unknown Contact. See original ticket]

Then again\, dtoa.c is not without is share of security problems\, see e.g. https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2009-0689 (with many links to follow)

I also got a pointer to another implementation​:

http​://git.musl-libc.org/cgit/musl/tree/src/internal/floatscan.c

Much more minimal (good)\, and does seemingly 80-bit long doubles (good)\, but while musl has in general good quality and reputation (good)\, how well has this been tested for the edge cases\, and there are more IEEE (or -ish) long double types.

p5pRT commented 10 years ago

From @jhi

After more inspection​: it seems that the gdtoa.tgz (also from netlib) is the "portable" version that handles more platforms\, including long doubles.

p5pRT commented 10 years ago

From [Unknown Contact. See original ticket]

After more inspection​: it seems that the gdtoa.tgz (also from netlib) is the "portable" version that handles more platforms\, including long doubles.

p5pRT commented 8 years ago

From @jhi

Some more commentary.

Some users of the dtoa.c​:

* Python​: https://hg.python.org/cpython/file/tip/Python/dtoa.c -- possibly relevant Python ticket​: http​://bugs.python.org/issue9009 * Ruby​: https://github.com/ruby/ruby/blob/trunk/util.c (dtoa.c relevant bits merged in here) * https://github.com/php/php-src/blob/master/Zend/zend_strtod.c * and multiple Mozilla projects​: https://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/dtoa.c

I also tried test building gdtoa in OS X with clang\, more than a year ago now\, and found some nits fixed in the attached patch. I sent the patch to David Gay but haven't heard back from him\, in a year or more.

Note that is important to use the latest gdtoa (20131209)\, since the older ones have known issues\, e.g.

http​://www.exploringbinary.com/how-strtod-works-and-sometimes-doesnt/ (see end)

or is tricky to compile (though Perl is saved from this\, since we explicitly avoid strict aliasing\, for our own good)​:

http​://patrakov.blogspot.com/2009/03/dont-use-old-dtoac.html

And no\, the official gdtoa is not in any kind of version control system. You get a date-stamped tar.

Rick Regan's blog is required reading on these matters\, see e.g.

http​://www.exploringbinary.com/inconsistent-rounding-of-printed-floating-point-numbers/

p5pRT commented 8 years ago

From @jhi

gdtoa-20131209-patches.tgz

p5pRT commented 8 years ago

From @jhi

I also tried test building gdtoa in OS X with clang\, more than a year ago now\, and found some nits fixed in the attached patch. I sent the patch to David Gay but haven't heard back from him\, in a year or more.

My apologies to Mr Gay\, it hasn't been more than a year\, just a few months. Dont' know where I pulled the one year from.

p5pRT commented 8 years ago

From @jhi

Adding link to https://rt-archive.perl.org/perl5/Ticket/Display.html?id=127182 since relevant discussion also there.

p5pRT commented 8 years ago

From @jhi

gdtoa 20160219 update from Mr Gay​:

http​://www.ampl.com/netlib/fp/gdtoa.tgz http​://www.ampl.com/netlib/fp/changes