Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

support hexadecimal floats #13966

Closed p5pRT closed 9 years ago

p5pRT commented 9 years ago

Migrated from rt.perl.org#122219 (status was 'resolved')

Searchable as RT122219$

p5pRT commented 9 years ago

From @jhi

[resubmitting since I think the grues ate my first attempt]

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN\, e.g. 0x1.47ae147ae147bp-7 is 0.1 * printf %a %A * input (PV->NV)​: "0xh.hhhpnnn" + 3

Lack of %a noted by Dan Kogai​: https://groups.google.com/d/msg/perl.perl5.porters/c84JU0olnbQ/YwQczyrqE2YJ Pointer given by Dan​: http​://en.wikipedia.org/wiki/Printf_format_string#Type

Possibly useful resource​: http​://www.exploringbinary.com/hexadecimal-floating-point-constants/ found by quick googling.

Ruby does support the %a %A as noted by Dan\, and Python has float.hex() and float.fromhex().

p5pRT commented 9 years ago

From @jhi

* literals​: 0xh.hhhp[+-]?NNN\, e.g. 0x1.47ae147ae147bp-7 is 0.1

Oops\, 0.01

p5pRT commented 9 years ago

From @iabyn

On Wed\, Jul 02\, 2014 at 07​:49​:46PM -0700\, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN\, e.g. 0x1.47ae147ae147bp-7 is 0.1

Wouldn't that change the meaning of existing legal syntax​: e.g.

  print 0x1.10;  
which currently prints "110"\, but would change to print "1.0625"

-- No matter how many dust sheets you use\, you will get paint on the carpet.

p5pRT commented 9 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 9 years ago

From @ilmari

Dave Mitchell \davem@​iabyn\.com writes​:

On Wed\, Jul 02\, 2014 at 07​:49​:46PM -0700\, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN\, e.g. 0x1.47ae147ae147bp-7 is 0.1   ^^^^^^^^^ ^^^

Wouldn't that change the meaning of existing legal syntax​: e.g.

print 0x1\.10;

which currently prints "110"\, but would change to print "1.0625"

  $ perl -e 'print 0x1.10p+0'   Bareword found where operator expected at -e line 1\, near "10p"   (Missing operator before p?)   syntax error at -e line 1\, near "10p   "   Execution of -e aborted due to compilation errors.

-- "A disappointingly low fraction of the human race is\, at any given time\, on fire." - Stig Sandbeck Mathisen

p5pRT commented 9 years ago

From @jhi

On Thursday-201407-03\, 8​:26\, Dagfinn Ilmari Mannsåker via RT wrote​:

Dave Mitchell \davem@​iabyn\.com writes​:

On Wed\, Jul 02\, 2014 at 07​:49​:46PM -0700\, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN\, e.g. 0x1.47ae147ae147bp-7 is 0.1 ^^^^^^^^^ ^^^

Wouldn't that change the meaning of existing legal syntax​: e.g.

 print 0x1\.10;

which currently prints "110"\, but would change to print "1.0625"

 $ perl \-e 'print 0x1\.10p\+0'
 Bareword found where operator expected at \-e line 1\, near "10p"
          \(Missing operator before p?\)
 syntax error at \-e line 1\, near "10p
 "
 Execution of \-e aborted due to compilation errors\.

Yeah\, I think the 'p' (hmm\, is that 'P' with %A?) is a mandatory part of the package.

p5pRT commented 9 years ago

From @Hugmeir

On Thu\, Jul 3\, 2014 at 2​:34 PM\, Jarkko Hietaniemi \jhi@​iki\.fi wrote​:

On Thursday-201407-03\, 8​:26\, Dagfinn Ilmari Mannsåker via RT wrote​:

Dave Mitchell \davem@​iabyn\.com writes​:

On Wed\, Jul 02\, 2014 at 07​:49​:46PM -0700\, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN\, e.g. 0x1.47ae147ae147bp-7 is 0.1

                   ^^^^^^^^^                        ^^^

Wouldn't that change the meaning of existing legal syntax​: e.g.

 print 0x1\.10;

which currently prints "110"\, but would change to print "1.0625"

 $ perl \-e 'print 0x1\.10p\+0'
 Bareword found where operator expected at \-e line 1\, near "10p"
          \(Missing operator before p?\)
 syntax error at \-e line 1\, near "10p
 "
 Execution of \-e aborted due to compilation errors\.

Yeah\, I think the 'p' (hmm\, is that 'P' with %A?) is a mandatory part of the package.

sub deadbeefp () {3} 0x1.deadbeefp+0

Personally\, I think adding the construct + a deprecation warning for pathological cases is a good enough (tm) tradeoff.

p5pRT commented 9 years ago

From @iabyn

On Thu\, Jul 03\, 2014 at 01​:25​:54PM +0100\, Dagfinn Ilmari Mannsåker wrote​:

Dave Mitchell \davem@​iabyn\.com writes​:

On Wed\, Jul 02\, 2014 at 07​:49​:46PM -0700\, Jarkko Hietaniemi wrote​:

Perl could support hexadecimal floats​:

* literals​: 0xh.hhhp[+-]?NNN\, e.g. 0x1.47ae147ae147bp-7 is 0.1 ^^^^^^^^^ ^^^

Wouldn't that change the meaning of existing legal syntax​: e.g.

print 0x1\.10;

which currently prints "110"\, but would change to print "1.0625"

$ perl \-e 'print 0x1\.10p\+0'
Bareword found where operator expected at \-e line 1\, near "10p"
         \(Missing operator before p?\)
syntax error at \-e line 1\, near "10p
"
Execution of \-e aborted due to compilation errors\.

Ah sorry\, didn't spot the p.

-- You're only as old as you look.

p5pRT commented 9 years ago

From @jhi

Yeah\, I think the 'p' (hmm\, is that 'P' with %A?) is a mandatory part of the package.

sub deadbeefp () {3} 0x1.deadbeefp+0

You have a twisted mind\, and this is a compliment.

Personally\, I think adding the construct + a deprecation warning for pathological cases is a good enough (tm) tradeoff.

Based on http​://grep.cpan.me/?q=0x%5B0-9a-f%5D%2B%5C.%5B0-9a-f%5D%2Bp%5B%2B-%5D%5Cd%2B (that's /0x[0-9a-f]+\.[0-9a-f]+p[+-]\d+/) I wouldn't bother even with a warning. (All the hits seem to be to modules which already somehow try to handle this currently non-native format.)

p5pRT commented 9 years ago

From @jhi

So I did some hacking to get this working for at least *printf and literals\, and two patches are attached. I cheated and just punted to using sprintf/strtod.

However​: the "hexadecimal floats" support seems to be quite... interesting. As in "interesting times" interesting.

So it's a C99 feature. Output with sprintf %a %A\, input with strtod (or strtold). In theory.

The attached patches (and their tests) work with​:

OSX x86 Linux x86 Linux x86 -Duselongdouble

(I *think* the output side at least did work in win32\, but the win32 smoker must be overwhelmed or something\, I seem to get no results)

But cracks start to appear...

OS X x86 with -Duselongdouble has differences in the *printf output Solaris x86 fails completely on input (as if strtod would not parse hexfloats at all\, haven't dug into it)

On the output side differences are easy since we are talking about floats​: the exponent may float. 0x1.999999999999ap-4 is 0xc.ccccccccccccccdp-7 (Linux "normal" doubles vs "long doubles")

But even what the basic %a means seems to be up to interpretation​: not ok 1420 - '%a' '1' -> '0x1.0000000000000p+0' cf '0x1p+0' (Solaris)

But if strtod is not working\, I don't feel like rewriting David Gay's dtoa.c (which is the canonical strtod source for many operating systems\, like BSD\, or other OSS projects use)​: http​://www.netlib.org/fp/dtoa.c

If output is not working (or needs to be standardized)\, we need to dig into the fp bits ourselves. I found this from the NetBSD​: https://github.com/rumpkernel/netbsd-userspace-src/blob/master/lib/libc/gdtoa/hdtoa.c

p5pRT commented 9 years ago

From @jhi

0001-Hexfloat-sprintf-a-A-part-of-perl-122219.patch ```diff From aab62f78c4f785265ec874e220e45ec4a0653b06 Mon Sep 17 00:00:00 2001 From: Jarkko Hietaniemi Date: Wed, 30 Jul 2014 21:59:57 -0400 Subject: [PATCH 1/2] Hexfloat sprintf %a/%A, part of perl #122219 Just punt the task to system printf, do whatever it does for %a/%A (%[efgEFG] are handled likewise). Let me count the ways this can go wrong: (1) long doubles (2) no %a (it's C99) (2) different implementations of %a (3) broken implementations of %a (5) IEEE 754 does not define endianness (big, little, mixed (some arms)) (6) non-IEEE-754 formats (vax, cray, ibm, ...) --- pod/perlfunc.pod | 7 +++++++ sv.c | 38 ++++++++++++++++++++++++++++++-------- t/op/sprintf.t | 4 ++-- t/op/sprintf2.t | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 94 insertions(+), 11 deletions(-) diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 173615b..877dc71 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -7109,6 +7109,8 @@ In addition, Perl permits the following widely-supported conversions: %p a pointer (outputs the Perl value's address in hexadecimal) %n special: *stores* the number of characters output so far into the next argument in the parameter list + %a hexadecimal floats + %A like %a, but using upper-case letters Finally, for backward (and we do mean "backward") compatibility, Perl permits these unnecessary but widely-supported conversions: @@ -7125,6 +7127,11 @@ exponent less than 100 is system-dependent: it may be three or less (zero-padded as necessary). In other words, 1.23 times ten to the 99th may be either "1.23e99" or "1.23e099". +Note that the hexadecimal digits produced by C<%a> and C<%A> are +system-dependent: most machines use the 64-bit IEEE 754 double +precision floating point, but some do not. Watch out especially +for the C Perl configuration option. + Between the C<%> and the format letter, you may specify several additional attributes controlling the interpretation of the format. In order, these are: diff --git a/sv.c b/sv.c index afd4376..df2f54c 100644 --- a/sv.c +++ b/sv.c @@ -11376,6 +11376,7 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p case 'e': case 'E': case 'f': case 'g': case 'G': + case 'a': case 'A': if (vectorize) goto unknown; @@ -11428,14 +11429,30 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p /* nv * 0 will be NaN for NaN, +Inf and -Inf, and 0 for anything else. frexp() has some unspecified behaviour for those three */ if (c != 'e' && c != 'E' && (nv * 0) == 0) { - i = PERL_INT_MIN; - /* FIXME: if HAS_LONG_DOUBLE but not USE_LONG_DOUBLE this - will cast our (long double) to (double) */ - (void)Perl_frexp(nv, &i); - if (i == PERL_INT_MIN) - Perl_die(aTHX_ "panic: frexp"); - if (i > 0) - need = BIT_DIGITS(i); + i = PERL_INT_MIN; + /* FIXME: if HAS_LONG_DOUBLE but not USE_LONG_DOUBLE this + will cast our (long double) to (double) */ + (void)Perl_frexp(nv, &i); + if (i == PERL_INT_MIN) + Perl_die(aTHX_ "panic: frexp"); + if (c == 'a' || c == 'A') { + /* This computation probably overshoots, + * but that is better than undershooting. */ + need += + (nv < 0) + /* possible unary minus */ + 2 + /* "0x" */ + 2 + /* "1." */ + /* We want one byte per each 4 bits in the + * mantissa. This works out to about 0.83 + * bytes per NV decimal digit (of 4 bits): + * (NV_DIG * log(10)/log(2)) / 4 */ + ((NV_DIG * 5) / 6 + 1) + + 2 + /* "p+" */ + (i >= 0 ? BIT_DIGITS(i) : 1 + BIT_DIGITS(-i)) + + 1; /* \0 */ + } else if (i > 0) { + need = BIT_DIGITS(i); + } /* if i < 0, the number of digits is hard to predict. */ } need += has_precis ? precis : 6; /* known default */ @@ -11573,6 +11590,11 @@ Perl_sv_vcatpvfn_flags(pTHX_ SV *const sv, const char *const pat, const STRLEN p STORE_LC_NUMERIC_SET_TO_NEEDED(); + /* XXX Configure test for sprintf %a/%A support. + * It is a C99 feature, but might be implemented elsewhere. + * The bad news is that if there is no support, + * we would need to implement %a/%A ourselves. */ + /* hopefully the above makes ptr a very constrained format * that is safe to use, even though it's not literal */ GCC_DIAG_IGNORE(-Wformat-nonliteral); diff --git a/t/op/sprintf.t b/t/op/sprintf.t index 4c41b16..234a7d6 100644 --- a/t/op/sprintf.t +++ b/t/op/sprintf.t @@ -179,7 +179,7 @@ __END__ >%6. 6s< >''< >%6. 6s INVALID REDUNDANT< >(See use of $w in code above)< >%6 .6s< >''< >%6 .6s INVALID REDUNDANT< >%6.6 s< >''< >%6.6 s INVALID REDUNDANT< ->%A< >''< >%A INVALID REDUNDANT< +>%A< >0< >< >tested in sprintf2.t skip: all< >%B< >2**32-1< >11111111111111111111111111111111< >%+B< >2**32-1< >11111111111111111111111111111111< >%#B< >2**32-1< >0B11111111111111111111111111111111< @@ -213,7 +213,7 @@ __END__ >%#X< >2**32-1< >0XFFFFFFFF< >%Y< >''< >%Y INVALID REDUNDANT< >%Z< >''< >%Z INVALID REDUNDANT< ->%a< >''< >%a INVALID REDUNDANT< +>%a< >0< >< >tested in sprintf2.t skip: all< >%b< >2**32-1< >11111111111111111111111111111111< >%+b< >2**32-1< >11111111111111111111111111111111< >%#b< >2**32-1< >0b11111111111111111111111111111111< diff --git a/t/op/sprintf2.t b/t/op/sprintf2.t index 6fd0bde..72bde57 100644 --- a/t/op/sprintf2.t +++ b/t/op/sprintf2.t @@ -12,7 +12,54 @@ BEGIN { eval { my $q = pack "q", 0 }; my $Q = $@ eq ''; -plan tests => 1406 + ($Q ? 0 : 12); +# %a and %A depend on the floating point config +# This totally doesn't test non-IEEE-754 float formats. +my @hexfloat; +if ($Config{nvsize} == 8) { # IEEE-754, we hope, the most common out there + @hexfloat = ( + [ '%a', '0', '0x0p+0' ], + [ '%a', '1', '0x1p+0' ], + [ '%a', '1.0', '0x1p+0' ], + [ '%a', '3.14', '0x1.91eb851eb851fp+1' ], + [ '%a', '-1.0', '-0x1p+0' ], + [ '%a', '-3.14', '-0x1.91eb851eb851fp+1' ], + [ '%a', '0.1', '0x1.999999999999ap-4' ], + [ '%a', '2**-10', '0x1p-10' ], + [ '%a', '2**10', '0x1p+10' ], + [ '%a', '1e-9', '0x1.12e0be826d695p-30' ], + [ '%a', '1e9', '0x1.dcd65p+29' ], + [ '%13a', '3.14', '0x1.91eb851eb851fp+1' ], + [ '%.7a', '3.14', '0x1.91eb852p+1' ], + [ '%.8a', '3.14', '0x1.91eb851fp+1' ], + [ '%.20a', '3.14', '0x1.91eb851eb851f0000000p+1' ], + [ '%20.10a', '3.14', ' 0x1.91eb851eb8p+1' ], + [ '%20.15a', '3.14', '0x1.91eb851eb851f00p+1' ], + [ '%A', '3.14', '0X1.91EB851EB851FP+1' ], + ); +} elsif ($Config{nvsize} == 16) { # x86 long double, at least + @hexfloat = ( + [ '%a', '0', '0x0p+0' ], + [ '%a', '1', '0x8p-3' ], + [ '%a', '1.0', '0x8p-3' ], + [ '%a', '3.14', '0xc.8f5c28f5c28f5c3p-2' ], + [ '%a', '-1.0', '-0x8p-3' ], + [ '%a', '-3.14', '-0xc.8f5c28f5c28f5c3p-2' ], + [ '%a', '0.1', '0xc.ccccccccccccccdp-7' ], + [ '%a', '2**-10', '0x8p-13' ], + [ '%a', '2**10', '0x8p+7' ], + [ '%a', '1e-9', '0x8.9705f4136b4a597p-33' ], + [ '%a', '1e9', '0xe.e6b28p+26' ], + [ '%13a', '3.14', '0xc.8f5c28f5c28f5c3p-2' ], + [ '%.7a', '3.14', '0xc.8f5c28fp-2' ], + [ '%.8a', '3.14', '0xc.8f5c28f6p-2' ], + [ '%.20a', '3.14', '0xc.8f5c28f5c28f5c300000p-2' ], + [ '%20.10a', '3.14', ' 0xc.8f5c28f5c3p-2' ], + [ '%20.15a', '3.14', '0xc.8f5c28f5c28f5c3p-2' ], + [ '%A', '3.14', '0XC.8F5C28F5C28F5C3P-2' ], + ); +} + +plan tests => 1406 + ($Q ? 0 : 12) + @hexfloat; use strict; use Config; @@ -336,3 +383,10 @@ is $o::count, '1', 'sprinf %1s overload count'; $o::count = 0; () = sprintf "%.1s", $o; is $o::count, '1', 'sprinf %.1s overload count'; + +for my $t (@hexfloat) { + my ($format, $arg, $expected) = @$t; + $arg = eval $arg; + my $result = sprintf($format, $arg); + is($result, $expected, "'$format' '$arg' -> '$result' cf '$expected'"); +} -- 1.8.5.2 (Apple Git-48) ```
p5pRT commented 9 years ago

From @jhi

0002-Hexfloat-literals-part-of-perl-122219.patch ```diff From 4d7069f0e1cf210e0cf8a3385cfb5e5716a5303b Mon Sep 17 00:00:00 2001 From: Jarkko Hietaniemi Date: Thu, 31 Jul 2014 12:37:58 -0400 Subject: [PATCH 2/2] Hexfloat literals, part of perl #122219 Punt to strtod/strtold, just like with decimal floats. The hexfloat support is C99 feature, like its converse %a/%A. --- MANIFEST | 1 + pod/perldata.pod | 8 +++++ pod/perldiag.pod | 17 ++++++++++ t/op/hexfloat.t | 78 +++++++++++++++++++++++++++++++++++++++++++ t/op/sprintf2.t | 8 +++++ toke.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++----- 6 files changed, 203 insertions(+), 9 deletions(-) create mode 100644 t/op/hexfloat.t diff --git a/MANIFEST b/MANIFEST index 54c5bea..5b99b16 100644 --- a/MANIFEST +++ b/MANIFEST @@ -5086,6 +5086,7 @@ t/op/hash-rt85026.t See if hash iteration/deletion works t/op/hash.t See if the complexity attackers are repelled t/op/hashwarn.t See if warnings for bad hash assignments work t/op/heredoc.t See if heredoc edge and corner cases work +t/op/hexfloat.t See if hexadecimal float literals work t/op/inccode.t See if coderefs work in @INC t/op/inccode-tie.t See if tie to @INC works t/op/incfilter.t See if the source filters in coderef-in-@INC work diff --git a/pod/perldata.pod b/pod/perldata.pod index d8edfe9..40d3336 100644 --- a/pod/perldata.pod +++ b/pod/perldata.pod @@ -402,6 +402,7 @@ integer formats: 0xdead_beef # more hex 0377 # octal (only numbers, begins with 0) 0b011011 # binary + 0x1.999ap-4 # hexadecimal floating point You are allowed to use underscores (underbars) in numeric literals between digits for legibility (but not multiple underscores in a row: @@ -425,6 +426,13 @@ Hexadecimal, octal, or binary, representations in string literals representation. The hex() and oct() functions make these conversions for you. See L and L for more details. +Hexadecimal floating point is useful for accurately presenting +floating point values, avoiding conversions to or from decimal floating +point, and therefore avoiding possible loss in precision. Notice +that while most current platforms use 64-bit IEEE 754 floating point, +not all do. For example x86 platforms can be configured with "long doubles", +which are not compatible with normal "doubles". + You can also embed newlines directly in your strings, i.e., they can end on a different line than they begin. This is nice, but if you forget your trailing quote, the error will not be reported until Perl finds diff --git a/pod/perldiag.pod b/pod/perldiag.pod index e41c8cc..d3553bd 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -2172,6 +2172,23 @@ created on an emergency basis to prevent a core dump. (F) The parser has given up trying to parse the program after 10 errors. Further error messages would likely be uninformative. +=item Hexadecimal float malformed: '%s' + +(W syntax) Hexadecimal float literals (like 0x12.34p5) are unsupported +in this system. + +=item Hexadecimal float overflow: '%s' + +(W syntax) Hexadecimal float literal overflowed. + +=item Hexadecimal float underflow: '%s' + +(W syntax) Hexadecimal float literal underflowed. + +=item Hexadecimal float unsupported: '%s' + +(F) Hexadecimal float literals (like 0x12.34p5) are unsupported in this system. + =item Hexadecimal number > 0xffffffff non-portable (W portable) The hexadecimal number you specified is larger than 2**32-1 diff --git a/t/op/hexfloat.t b/t/op/hexfloat.t new file mode 100644 index 0000000..eb8f6bb --- /dev/null +++ b/t/op/hexfloat.t @@ -0,0 +1,78 @@ +#!./perl + +use strict; + +BEGIN { + chdir 't' if -d 't'; + require './test.pl'; +} + +plan(tests => 38); + +# Test hexfloat literals. + +is(0x1p0, 1); +is(0x1.p0, 1); +is(0x1.0p0, 1); + +is(0x1p1, 2); +is(0x1.p1, 2); +is(0x1.0p1, 2); + +is(0x.1p0, 0.0625); +is(0x0.1p0, 0.0625); + +# Positive exponents. +is(0x1p2, 4); +is(0x1p+2, 4); + +# Negative exponents. +is(0x1p-1, 0.5); +is(0x1.p-1, 0.5); +is(0x1.0p-1, 0.5); + +is(0x1p+2, 4); +is(0x1p-2, 0.25); + +is(0x3p+2, 12); +is(0x3p-2, 0.75); + +# Negative sign. +is(-0x1p+2, -4); +is(-0x1p-2, -0.25); + +is(0x0.10p0, 0.0625); +is(0x0.1p0, 0.0625); +is(0x.1p0, 0.0625); + +is(0x12p+3, 144); +is(0x12p-3, 2.25); + +# Hexdigits (lowercase). +is(0x9p+0, 9); +is(0xap+0, 10); +is(0xfp+0, 15); +is(0x10p+0, 16); +is(0x11p+0, 17); +is(0xabp+0, 171); +is(0xab.cdp+0, 171.80078125); + +# Uppercase hexdigits and exponent prefix. +is(0xAp+0, 10); +is(0xFp+0, 15); +is(0xABP+0, 171); +is(0xAB.CDP+0, 171.80078125); + +# Underbars. +is(0xa_b.c_dp+0, 171.80078125); + +# Note that the hexfloat representation is not unique +# since the exponent can be shifted: no different from +# 3e4 cf 30e3 cf 30000. + +# Needs to use within because of long doubles. +within(0x1.999999999999ap-4, 0.1, 1e-9); +within(0xc.ccccccccccccccdp-7, 0.1, 1e-9); + +# sprintf %a/%A testing is done in sprintf2.t, +# trickier than necessary because of long doubles. diff --git a/t/op/sprintf2.t b/t/op/sprintf2.t index 72bde57..824c06a 100644 --- a/t/op/sprintf2.t +++ b/t/op/sprintf2.t @@ -34,7 +34,11 @@ if ($Config{nvsize} == 8) { # IEEE-754, we hope, the most common out there [ '%.20a', '3.14', '0x1.91eb851eb851f0000000p+1' ], [ '%20.10a', '3.14', ' 0x1.91eb851eb8p+1' ], [ '%20.15a', '3.14', '0x1.91eb851eb851f00p+1' ], + [ '%A', '3.14', '0X1.91EB851EB851FP+1' ], + + [ '%a', 0x12.34p5, '0x1.234p+9' ], + [ '%a', 0x1_2.3_4p5, '0x1.234p+9' ], ); } elsif ($Config{nvsize} == 16) { # x86 long double, at least @hexfloat = ( @@ -55,7 +59,11 @@ if ($Config{nvsize} == 8) { # IEEE-754, we hope, the most common out there [ '%.20a', '3.14', '0xc.8f5c28f5c28f5c300000p-2' ], [ '%20.10a', '3.14', ' 0xc.8f5c28f5c3p-2' ], [ '%20.15a', '3.14', '0xc.8f5c28f5c28f5c3p-2' ], + [ '%A', '3.14', '0XC.8F5C28F5C28F5C3P-2' ], + + [ '%a', 0x12.34p5, '0x9.1ap+6' ], + [ '%a', 0x1_2.3_4p5, '0x9.1ap+6' ], ); } diff --git a/toke.c b/toke.c index b0997ef..8454d6f 100644 --- a/toke.c +++ b/toke.c @@ -9796,6 +9796,7 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp) bool floatit; /* boolean: int or float? */ const char *lastub = NULL; /* position of last underbar */ static const char* const number_too_long = "Number too long"; + bool hexfloat = FALSE; PERL_ARGS_ASSERT_SCAN_NUM; @@ -9909,6 +9910,14 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp) /* make sure they said 0x */ if (shift != 4) goto out; + + if (s[1] == '.' && + /* hexfloat? peekahead to avoid matching ".." */ + (isXDIGIT(s[2]) || s[1] == 'p' || s[2] == 'P')) { + s++; + goto out; + } + b = (*s++ & 7) + 9; /* Prepare to put the digit we have onto the end @@ -9977,6 +9986,25 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp) sv, NULL, NULL, 0); else if (PL_hints & HINT_NEW_BINARY) sv = new_constant(start, s - start, "binary", sv, NULL, NULL, 0); + if (*s == '.' || *s == 'p' || *s == 'P') { + /* sloppy (on the underbars) but quick detection of + * hexfloats, the decimal detection will be more + * thorough. */ + const char* h = s; + if (*h == '.') { + h++; + while (isXDIGIT(*h) || *h == '_') h++; + } + if (*h == 'p' || *h == 'P') { + h++; + if (*h == '+' || *h == '-') + h++; + if (isDIGIT(*h)) { + hexfloat = TRUE; + goto decimal; + } + } + } } break; @@ -9989,10 +10017,16 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp) decimal: d = PL_tokenbuf; e = PL_tokenbuf + sizeof PL_tokenbuf - 6; /* room for various punctuation */ - floatit = FALSE; + floatit = FALSE; + if (hexfloat) { + floatit = TRUE; + *d++ = '0'; + *d++ = 'x'; + s = start + 2; + } /* read next group of digits and _ and copy into d */ - while (isDIGIT(*s) || *s == '_') { + while (isDIGIT(*s) || (hexfloat && isXDIGIT(*s)) || *s == '_') { /* skip underscores, checking for misplaced ones if -w is on */ @@ -10032,7 +10066,8 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp) /* copy, ignoring underbars, until we run out of digits. */ - for (; isDIGIT(*s) || *s == '_'; s++) { + for (; isDIGIT(*s) || (hexfloat && isXDIGIT(*s)) || + *s == '_'; s++) { /* fixed length buffer check */ if (d >= e) Perl_croak(aTHX_ "%s", number_too_long); @@ -10058,12 +10093,21 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp) } /* read exponent part, if present */ - if ((*s == 'e' || *s == 'E') && strchr("+-0123456789_", s[1])) { - floatit = TRUE; + if (((*s == 'e' || *s == 'E') || (*s == 'p' || *s == 'P')) && + strchr("+-0123456789_", s[1])) { + floatit = TRUE; + + /* regardless of whether user said 3E5 or 3e5, use lower 'e', + ditto for p (hexfloats) */ + if ((*s == 'e' || *s == 'E')) { + /* At least some Mach atof()s don't grok 'E' */ + *d++ = 'e'; + } else if ((*s == 'p' || *s == 'P')) { + *d++ = 'p'; + } + s++; - /* regardless of whether user said 3E5 or 3e5, use lower 'e' */ - *d++ = 'e'; /* At least some Mach atof()s don't grok 'E' */ /* stray preinitial _ */ if (*s == '_') { @@ -10127,9 +10171,47 @@ Perl_scan_num(pTHX_ const char *start, YYSTYPE* lvalp) STORE_NUMERIC_LOCAL_SET_STANDARD(); /* terminate the string */ *d = '\0'; - nv = Atof(PL_tokenbuf); + if (hexfloat) { + /* for hexfloats, punt to strtod/strtold, or die. */ + /* XXX Configure test for strtod/strtold hexfloat support. + * It is a C99 feature, but might be implemented elsewhere. */ + char* endp = PL_tokenbuf; + dSAVE_ERRNO; + SETERRNO(0,0); +#if defined(USE_LONG_DOUBLE) && defined(HAS_STRTOLD) + nv = strtold(PL_tokenbuf, &endp); +#elif defined(HAS_STRTOD) + nv = strtod(PL_tokenbuf, &endp); +#else + Perl_croak(aTHX_ + "Hexadecimal float unsupported: '%s'", + PL_tokenbuf); +#endif + /* XXX test these warnings */ + /* errno is ERANGE, commonly, but any non-zero + * errno should indicate failure (note that the + * scope above is intentionally tight: set errno + * to zero, call strtod or strtold, inspect errno.) */ + if (errno) { + if (nv == NV_INF || nv == -NV_INF) + Perl_ck_warner(aTHX_ packWARN(WARN_SYNTAX), + "Hexadecimal float overflow: '%s'", + PL_tokenbuf); + else if (nv == 0.0) + Perl_ck_warner(aTHX_ packWARN(WARN_SYNTAX), + "Hexadecimal float underflow: '%s'", + PL_tokenbuf); + } + if (endp == NULL || endp == PL_tokenbuf || *endp) + Perl_ck_warner(aTHX_ packWARN(WARN_SYNTAX), + "Hexadecimal float malformed: '%s'", + PL_tokenbuf); + RESTORE_ERRNO; + } else { + nv = Atof(PL_tokenbuf); + } RESTORE_NUMERIC_LOCAL(); - sv = newSVnv(nv); + sv = newSVnv(nv); } if ( floatit -- 1.8.5.2 (Apple Git-48) ```
p5pRT commented 9 years ago

From [Unknown Contact. See original ticket]

So I did some hacking to get this working for at least *printf and literals\, and two patches are attached. I cheated and just punted to using sprintf/strtod.

However​: the "hexadecimal floats" support seems to be quite... interesting. As in "interesting times" interesting.

So it's a C99 feature. Output with sprintf %a %A\, input with strtod (or strtold). In theory.

The attached patches (and their tests) work with​:

OSX x86 Linux x86 Linux x86 -Duselongdouble

(I *think* the output side at least did work in win32\, but the win32 smoker must be overwhelmed or something\, I seem to get no results)

But cracks start to appear...

OS X x86 with -Duselongdouble has differences in the *printf output Solaris x86 fails completely on input (as if strtod would not parse hexfloats at all\, haven't dug into it)

On the output side differences are easy since we are talking about floats​: the exponent may float. 0x1.999999999999ap-4 is 0xc.ccccccccccccccdp-7 (Linux "normal" doubles vs "long doubles")

But even what the basic %a means seems to be up to interpretation​: not ok 1420 - '%a' '1' -> '0x1.0000000000000p+0' cf '0x1p+0' (Solaris)

But if strtod is not working\, I don't feel like rewriting David Gay's dtoa.c (which is the canonical strtod source for many operating systems\, like BSD\, or other OSS projects use)​: http​://www.netlib.org/fp/dtoa.c

If output is not working (or needs to be standardized)\, we need to dig into the fp bits ourselves. I found this from the NetBSD​: https://github.com/rumpkernel/netbsd-userspace-src/blob/master/lib/libc/gdtoa/hdtoa.c

p5pRT commented 9 years ago

From @arc

Jarkko Hietaniemi via RT \perlbug\-comment@&#8203;perl\.org wrote​:

So I did some hacking to get this working for at least *printf and literals\, and two patches are attached.

Excellent — thanks!

I cheated and just punted to using sprintf/strtod. (I *think* the output side at least did work in win32\, but the win32 smoker must be overwhelmed or something\, I seem to get no results)

According to this page​:

http​://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.71).aspx

the compiler in Visual Studio 2003 doesn't support %a formats in printf. AIUI\, we aim to support VC6\, which I assume also doesn't support %a. So I think punting to sprintf/strtod for hex-float support\, while admirably tempting from a laziness point of view\, may not be a viable approach\, at least on win32.

Corrections welcome from anyone who knows anything about win32.

On the output side differences are easy since we are talking about floats​: the exponent may float. 0x1.999999999999ap-4 is 0xc.ccccccccccccccdp-7 (Linux "normal" doubles vs "long doubles")

I think that's not terribly unreasonable. An IEEE double has 53 bits of significand\, which can be emitted with a single bit (whose value is 1 except in denormals) before the hexadecimal point\, and thirteen hex digits (four bits apiece) after it. An x86 long double\, on the other hand\, has 63 bits of significand\, so emitting 3 bits before the point and 15 nybbles after it seems straightforward.

But I take your point that it's somewhat vexing for these purposes.

But even what the basic %a means seems to be up to interpretation​: not ok 1420 - '%a' '1' -> '0x1.0000000000000p+0' cf '0x1p+0' (Solaris)

That's undeniably a fairly cruddy %a implementation (in the sense that if you wanted all those extra digits you'd surely ask for them) but it's not actually *wrong*. Which is\, yes\, also vexing for our purposes.

But if strtod is not working\, I don't feel like rewriting David Gay's dtoa.c (which is the canonical strtod source for many operating systems\, like BSD\, or other OSS projects use)​: http​://www.netlib.org/fp/dtoa.c

If output is not working (or needs to be standardized)\, we need to dig into the fp bits ourselves. I found this from the NetBSD​: https://github.com/rumpkernel/netbsd-userspace-src/blob/master/lib/libc/gdtoa/hdtoa.c

As far as I know\, it's possible to implement hex float I/O without bit-banging as long as you've got ldexp\, frexp\, isnormal\, isnan\, and isinf. But I doubt very much whether those can reliably be found on older systems that lack hex-float support in strtod and %a in sprintf. :-(

What would happen if we borrowed one of the other implementations wholesale? Are there any licensing issues getting in the way?

-- Aaron Crane ** http​://aaroncrane.co.uk/

p5pRT commented 9 years ago

From @jhi

0x1.999999999999ap-4 is 0xc.ccccccccccccccdp-7 (Linux "normal" doubles vs "long doubles")

I think that's not terribly unreasonable. An IEEE double has 53 bits of significand\, which can be emitted with a single bit (whose value is 1 except in denormals) before the hexadecimal point\, and thirteen hex digits (four bits apiece) after it. An x86 long double\, on the other hand\, has 63 bits of significand\, so emitting 3 bits before the point and 15 nybbles after it seems straightforward.

I should have included more examples\, I think Solaris provided those... it's not just due to long doubles. I don't have a C99 spec in front of me\, but I doubt how well defined the format it is...

But even what the basic %a means seems to be up to interpretation​: not ok 1420 - '%a' '1' -> '0x1.0000000000000p+0' cf '0x1p+0' (Solaris)

That's undeniably a fairly cruddy %a implementation (in the sense that if you wanted all those extra digits you'd surely ask for them) but it's not actually *wrong*. Which is\, yes\, also vexing for our purposes.

For example​: what is the '%a' supposed to "optimize for"? As few hexdigits before the "." as possible? Maximize the exponent? Minimize it? Steer it towards the closest/lowest/highest exponent divisible by four? By eight?

As far as I know\, it's possible to implement hex float I/O without bit-banging as long as you've got ldexp\, frexp\, isnormal\, isnan\, and isinf. But I doubt very much whether those can reliably be found on older systems that lack hex-float support in strtod and %a in sprintf. :-(

Indeed.

(Which reminds me that our inf/nan support is still a bit dubious.)

What would happen if we borrowed one of the other implementations wholesale? Are there any licensing issues getting in the way?

BSD licensed code is no problem\, we have historically borrowed used that... mergesort\, for example. drand48_r.

For the netlib code\, somebody with legal chops would have to take a look for compatibility with Artistic/GPL. Not that I expect any problems\, since e.g. Python includes it.

p5pRT commented 9 years ago

From @jhi

Solaris x86 fails completely on input (as if strtod would not parse hexfloats at all\, haven't dug into it)

Now did. Ugh.

In Solaris 10\, strtod must be in "c99 mode" for the hexfloats to be recognized. (strtold is always in this mode). The "c99 mode' is achieved by using "c99" as the Solaris Studio compiler (driver)\, instead of "cc".

In Solaris 9 (or earlier)\, there is no support for hexfloats. (Not blaming Solaris in particular​: I'm pretty certain many older OS releases will be similarly C99-unsupportive.)

If one is not using Solaris Studio cc (something beginning with g\, maybe)\, one can live dangerously and explicitly link in either of /usr/lib/{32\,64}/values-xpg6.o and get the "c99 strtod". Dangerous living because probably many other things get "upgraded"\, too.

Executive summary​: using the netlib dtoa.c (*) is starting to sound even more siren-like.

(*) an odd name\, given that it's strtod implementation...

p5pRT commented 9 years ago

From [Unknown Contact. See original ticket]

Solaris x86 fails completely on input (as if strtod would not parse hexfloats at all\, haven't dug into it)

Now did. Ugh.

In Solaris 10\, strtod must be in "c99 mode" for the hexfloats to be recognized. (strtold is always in this mode). The "c99 mode' is achieved by using "c99" as the Solaris Studio compiler (driver)\, instead of "cc".

In Solaris 9 (or earlier)\, there is no support for hexfloats. (Not blaming Solaris in particular​: I'm pretty certain many older OS releases will be similarly C99-unsupportive.)

If one is not using Solaris Studio cc (something beginning with g\, maybe)\, one can live dangerously and explicitly link in either of /usr/lib/{32\,64}/values-xpg6.o and get the "c99 strtod". Dangerous living because probably many other things get "upgraded"\, too.

Executive summary​: using the netlib dtoa.c (*) is starting to sound even more siren-like.

(*) an odd name\, given that it's strtod implementation...

p5pRT commented 9 years ago

From @jhi

[dtoa.c] an odd name\, given that it's strtod implementation...

Good news\, everyone... the netlib dtoa.c contains *both* strtod() and dtoa()\, the latter useable for sprintfing.

It is quite widely used​: Python\, PHP\, and *Java*; and Chrome\, Firefox\, and Safari.

More useful reading​: http​://www.exploringbinary.com/how-strtod-works-and-sometimes-doesnt/ (note that this article is 2 years old\, the bugs referred to have been corrected)

p5pRT commented 9 years ago

From [Unknown Contact. See original ticket]

[dtoa.c] an odd name\, given that it's strtod implementation...

Good news\, everyone... the netlib dtoa.c contains *both* strtod() and dtoa()\, the latter useable for sprintfing.

It is quite widely used​: Python\, PHP\, and *Java*; and Chrome\, Firefox\, and Safari.

More useful reading​: http​://www.exploringbinary.com/how-strtod-works-and-sometimes-doesnt/ (note that this article is 2 years old\, the bugs referred to have been corrected)

p5pRT commented 9 years ago

From @jhi

From https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122482:

And since we are not really depending on the system strtod​:s anyway (except for nan/inf)\, it looks like for the hexadecimal fp "strtod-ing" it would be better just to implement our own. This would not\, however\, solve the hexadecimal fp output.

On the hexadecimal output the killer wording in the C99 seems to be that trailing zeros *may* be printed. And this is what Solaris does\, but glibc (Linux)\, and whatever is used in OS X\, do not.

p5pRT commented 9 years ago

From [Unknown Contact. See original ticket]

From https://rt-archive.perl.org/perl5/Ticket/Display.html?id=122482:

And since we are not really depending on the system strtod​:s anyway (except for nan/inf)\, it looks like for the hexadecimal fp "strtod-ing" it would be better just to implement our own. This would not\, however\, solve the hexadecimal fp output.

On the hexadecimal output the killer wording in the C99 seems to be that trailing zeros *may* be printed. And this is what Solaris does\, but glibc (Linux)\, and whatever is used in OS X\, do not.

p5pRT commented 9 years ago

From @cpansprout

On Thu Jul 03 08​:18​:01 2014\, jhi wrote​:

Yeah\, I think the 'p' (hmm\, is that 'P' with %A?) is a mandatory part of the package.

sub deadbeefp () {3} 0x1.deadbeefp+0

You have a twisted mind\, and this is a compliment.

Personally\, I think adding the construct + a deprecation warning for pathological cases is a good enough (tm) tradeoff.

Based on http​://grep.cpan.me/?q=0x%5B0-9a-f%5D%2B%5C.%5B0-9a-f%5D%2Bp%5B%2B- %5D%5Cd%2B (that's /0x[0-9a-f]+\.[0-9a-f]+p[+-]\d+/) I wouldn't bother even with a warning. (All the hits seem to be to modules which already somehow try to handle this currently non-native format.)

This came up on the list a couple of years ago. At the time I think the consensus was to allow parser plugins to extend the syntax\, instead of hard-coding one of them into toke.c.

When we first tried to reserve this syntax (or something similar) by deprecating 0xf00 followed by a dot\, several cases showed up in the perl tests themselves. I think they got changed\, masking the fact that such syntax already occurs in real life.

Now this is all from memory without actually looking anything up....

--

Father Chrysostomos

p5pRT commented 9 years ago

From @jhi

This came up on the list a couple of years ago. At the time I think the consensus was to allow parser plugins to extend the syntax\, instead of hard-coding one of them into toke.c.

Having looked at the toke.c now for a while\, I think the plugin plan is wishful thinking unless something drastic happens first.

When we first tried to reserve this syntax (or something similar) by deprecating 0xf00 followed by a dot\, several cases showed up in the perl tests themselves. I think they got changed\, masking the fact that such syntax already occurs in real life.

I would find that surprising... the "pEXPONENT" part is currently syntax error.

p5pRT commented 9 years ago

From [Unknown Contact. See original ticket]

This came up on the list a couple of years ago. At the time I think the consensus was to allow parser plugins to extend the syntax\, instead of hard-coding one of them into toke.c.

Having looked at the toke.c now for a while\, I think the plugin plan is wishful thinking unless something drastic happens first.

When we first tried to reserve this syntax (or something similar) by deprecating 0xf00 followed by a dot\, several cases showed up in the perl tests themselves. I think they got changed\, masking the fact that such syntax already occurs in real life.

I would find that surprising... the "pEXPONENT" part is currently syntax error.

p5pRT commented 9 years ago

From @jhi

For better or worse\, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl\, scalbnl\, or ldexp http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats\, without depending on C99 or using system printf/strtod.

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

p5pRT commented 9 years ago

From [Unknown Contact. See original ticket]

For better or worse\, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl\, scalbnl\, or ldexp http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats\, without depending on C99 or using system printf/strtod.

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

p5pRT commented 9 years ago

From @craigberry

On Thu\, Aug 14\, 2014 at 6​:56 AM\, Jarkko Hietaniemi via RT \perlbug\-comment@&#8203;perl\.org wrote​:

For better or worse\, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl\, scalbnl\, or ldexp http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats\, without depending on C99 or using system printf/strtod.

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

Would there be any advantage in toke.c to using Uquad_t or U64TYPE (where available) rather than UV for the chunk that holds the mantissa? The size chosen for Perl's integers don't necessarily reflect what's available on the platform?

p5pRT commented 9 years ago

From @jhi

On Thursday-201408-14\, 8​:51\, Craig A. Berry wrote​:

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

Would there be any advantage in toke.c to using Uquad_t or U64TYPE (where available) rather than UV for the chunk that holds the mantissa? The size chosen for Perl's integers don't necessarily reflect what's available on the platform?

Ah\, good point. As a matter of fact\, I use that very fact in sv.c already (look for MANTISSATYPE). I'll take a look in a couple of days once we see how widespread damage this first batch caused.

(I also need to think more carefully what happens/should happen at floating point "extremities" like Inf and Nan.)

p5pRT commented 9 years ago

From @arc

Jarkko Hietaniemi via RT \perlbug\-comment@&#8203;perl\.org wrote​:

For better or worse\, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl\, scalbnl\, or ldexp http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats\, without depending on C99 or using system printf/strtod.

Hurrah! Thanks very much for this.

Earlier in this ticket\, Brian Fraser pointed out the existence of cases like this​:

sub ap1 { 'z' } is 0x1.ap1\, '1z';

Jarkko reports having found no such affected code using grep.cpan.me\, and I freely stipulate that any code whose meaning changes in the presence of hex float literals (like this example) would be somewhat pathological. However\, I do find myself wondering whether hex float literals should be accepted only in the presence of a suitable feature.

Any thoughts? Am I worrying unnecessarily?

-- Aaron Crane ** http​://aaroncrane.co.uk/

p5pRT commented 9 years ago

From @jhi

On Thursday-201408-14\, 9​:13\, Aaron Crane wrote​:

However\, I do find myself wondering whether hex float literals should be accepted only in the presence of a suitable feature.

I would wait for Andreas' CPAN smokes.

p5pRT commented 9 years ago

From @jhi

On Thursday-201408-14\, 9​:13\, Aaron Crane wrote​:

Jarkko reports having found no such affected code using grep.cpan.me\,

... and off-hand\, all the hits from grep.cpan.me seem to be in strings\, comments\, or alien formats (MATLAB).

p5pRT commented 9 years ago

From @khwilliamson

On 08/14/2014 05​:56 AM\, Jarkko Hietaniemi via RT wrote​:

For better or worse\, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl\, scalbnl\, or ldexp http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats\, without depending on C99 or using system printf/strtod.

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

Thanks for this.

Looking at the code\, one minor thing jumped out at me\, and that is we now have in handy.h two macros XDIGIT_VALUE(c) and READ_XDIGIT(s) (originally contributed by Yves IIRC) that I think are both faster and clearer than using PL_hexdigit\, and all previous core uses of strchr() and PL_hexdigit had been converted to use these.

p5pRT commented 9 years ago

From @jhi

On Thursday-201408-14\, 13​:24\, Karl Williamson wrote​:

Looking at the code\, one minor thing jumped out at me\, and that is we now have in handy.h two macros XDIGIT_VALUE(c) and READ_XDIGIT(s)

Thanks\, adding to the "followup todo" notes I'm keeping on this.

p5pRT commented 9 years ago

From @jhi

I now pushed a bunch of cleanups for this (thanks for all the comments)\, including fix for the one serious bug found so far​: the code was broken on little-endian :-( [with usual 64-bit IEEE 754 double] but H.Merijn's HP-UX (PA) showed me the error of my ways.

I also tried to prepare for weirder combinations like having no quads to extract the mantissa bits to\, or the double-double platforms (which currently don't really extract the bits from the double-doubles but instead lossily use the frexp+ldexp path).

p5pRT commented 9 years ago

From @jhi

On Thursday-201408-14\, 9​:16\, Jarkko Hietaniemi wrote​:

I would wait for Andreas' CPAN smokes.

Andreas reports no breakages.

p5pRT commented 9 years ago

From @demerphq

On 14 August 2014 19​:24\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

On 08/14/2014 05​:56 AM\, Jarkko Hietaniemi via RT wrote​:

For better or worse\, I have now submitted

http​://perl5.git.perl.org/perl.git/commit/dc91db6 Configure scan for the kind of long double we have http​://perl5.git.perl.org/perl.git/commit/688e39e5 Configure scan for ldexpl http​://perl5.git.perl.org/perl.git/commit/98181445 Perl_ldexp is one of ldexpl\, scalbnl\, or ldexp http​://perl5.git.perl.org/perl.git/commit/40bca5ae9 Hexadecimal float sprintf http​://perl5.git.perl.org/perl.git/commit/61e61fbc Hexadecimal float literals

which implement hexadecimal floats\, without depending on C99 or using system printf/strtod.

The dc91db6 will probably contain many bad guesses for the non-Configure platforms. Only smokes will tell.

Thanks for this.

Looking at the code\, one minor thing jumped out at me\, and that is we now have in handy.h two macros XDIGIT_VALUE(c) and READ_XDIGIT(s) (originally contributed by Yves IIRC) that I think are both faster and clearer than using PL_hexdigit\, and all previous core uses of strchr() and PL_hexdigit had been converted to use these.

Which I see you craftily rewrote to be more efficient. :-)

Nice stuff Karl.

I always learn cool bit-twiddling tricks from your code. Its nice.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 9 years ago

From @jhi

On Thursday-201408-14\, 22​:51\, Jarkko Hietaniemi wrote​:

I now pushed a bunch of cleanups for this (thanks for all the comments)\, including fix for the one serious bug found so far​: the code was broken on little-endian :-( [with usual 64-bit IEEE 754 double] but H.Merijn's HP-UX (PA) showed me the error of my ways.

I also tried to prepare for weirder combinations like having no quads to extract the mantissa bits to\, or the double-double platforms (which currently don't really extract the bits from the double-doubles but instead lossily use the frexp+ldexp path).

And another batch of cleanups. I now bravely think that big-endian works\, and that the "double-double" (e.g. AIX) also works.

Remaining issues​:

- Windows running on Itanium? The canned configs all say that   no long double for you\, though. But Itanium does have hardware   IEEE 754 "quadruples". No compiler support?

- VMS? Runs across three architectures​: Itanium or Alpha or VAX.   I assumed 128-bit "true" IEEE 754 for all of them (and little-endian).

- the double-double support code was basically a wild guess.   and even if it works\, the sprintf2 doesn't test for it.

p5pRT commented 9 years ago

From @craigberry

On Fri\, Aug 15\, 2014 at 10​:14 AM\, Jarkko Hietaniemi \jhi@&#8203;iki\.fi wrote​:

- VMS? Runs across three architectures​: Itanium or Alpha or VAX. I assumed 128-bit "true" IEEE 754 for all of them (and little-endian).

On OpenVMS I64 as of v5.21.2-156-gd8bcb4d with -Duse64bitint -Duselongdouble I get​:

$ perl -e "$x = sprintf(qq/%A/\, 0);" assert error​: expression = vend \< vdig + sizeof(vdig)\, in file D0​:[craig.blead]sv.c;1 at line 11759

Dunno what's wrong yet.

p5pRT commented 9 years ago

From @craigberry

On Fri\, Aug 15\, 2014 at 4​:10 PM\, Craig A. Berry \craig\.a\.berry@&#8203;gmail\.com wrote​:

On Fri\, Aug 15\, 2014 at 10​:14 AM\, Jarkko Hietaniemi \jhi@&#8203;iki\.fi wrote​:

- VMS? Runs across three architectures​: Itanium or Alpha or VAX. I assumed 128-bit "true" IEEE 754 for all of them (and little-endian).

On OpenVMS I64 as of v5.21.2-156-gd8bcb4d with -Duse64bitint -Duselongdouble I get​:

$ perl -e "$x = sprintf(qq/%A/\, 0);" assert error​: expression = vend \< vdig + sizeof(vdig)\, in file D0​:[craig.blead]sv.c;1 at line 11759

Dunno what's wrong yet.

The VMS debugger shows the following​:

SV\Perl_sv_vcatpvfn_flags\%LINE 96933\vend​: 2060475744 SV\Perl_sv_vcatpvfn_flags\%LINE 96933\vdig[0​:31]   [0]-[31]​: 0 2060475712 DBG> evaluate sizeof(vdig) 32 DBG> evaluate vend \< vdig + sizeof(vdig) %DEBUG-I-SCALEADD\, pointer addition​: scale factor of 1 applied to right argument 0

So the assertion 2060475744 \< 2060475712 + 32 is false because the LHS is actually equal\, not less than\, the RHS. I don't understand the code well enough to know what that means.

p5pRT commented 9 years ago

From @jhi

On Friday\, August 15\, 2014\, Craig A. Berry \craig\.a\.berry@&#8203;gmail\.com wrote​:

On Fri\, Aug 15\, 2014 at 4​:10 PM\, Craig A. Berry \<craig.a.berry@​gmail.com \<javascript​:;>> wrote​:

On Fri\, Aug 15\, 2014 at 10​:14 AM\, Jarkko Hietaniemi \<jhi@​iki.fi \<javascript​:;>> wrote​:

- VMS? Runs across three architectures​: Itanium or Alpha or VAX. I assumed 128-bit "true" IEEE 754 for all of them (and little-endian).

On OpenVMS I64 as of v5.21.2-156-gd8bcb4d with -Duse64bitint -Duselongdouble I get​:

$ perl -e "$x = sprintf(qq/%A/\, 0);" assert error​: expression = vend \< vdig + sizeof(vdig)\, in file D0​:[craig.blead]sv.c;1 at line 11759

Dunno what's wrong yet.

The VMS debugger shows the following​:

SV\Perl_sv_vcatpvfn_flags\%LINE 96933\vend​: 2060475744 SV\Perl_sv_vcatpvfn_flags\%LINE 96933\vdig[0​:31] [0]-[31]​: 0 2060475712 DBG> evaluate sizeof(vdig) 32 DBG> evaluate vend \< vdig + sizeof(vdig) %DEBUG-I-SCALEADD\, pointer addition​: scale factor of 1 applied to right argument 0

So the assertion 2060475744 \< 2060475712 + 32 is false because the LHS is actually equal\, not less than\, the RHS. I don't understand the code well enough to know what that means.

Neither do I\, I just recently wrote it...

That means that for some reason v (the pointer for the hexdigits (really 0-15\, not the '0'..'f') has extended all the way to the end of the the buffer. I see why i think... i will push a branch

-- There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

p5pRT commented 9 years ago

From @sisyphus

-----Original Message----- From​: Craig A. Berry

$ perl -e "$x = sprintf(qq/%A/\, 0);" assert error​: expression = vend \< vdig + sizeof(vdig)\, in file D0​:[craig.blead]sv.c;1 at line 11759

At least that one works correctly for me on (debian wheezy) powerpc64 perl-5.21.3\, built from yesterday's git with -Duselongdouble (double-double).

Here's some values that don't look right\, however​:

For 1e-298\, the 2 doubles (most significant first) are 0210be08d0527e1d and 0000000069c4b77f\, both of which are positive values.

If I do 'printf "%A"\, 1e-298;' then I get​: 0XB.E08D0527E1D000069C4B77FP-991

Those 4 zeroes in the middle are wrong - they should appear at the end. (This probably just means that the value of the exponent of the least significant double has been miscalculated.) But I think it's also incorrect at the start. The most siginificant 13 bits of the mantissa (including the implied leading '1') are 1000010111110 - which doesn't correlate at all well with 0XB.E0 Data​::Float​::DoubleDouble gives the following hex value of the double-double 1e-298​: +0x1.0be08d0527e1d69c4b77f000000p-990

(In the Data​::Float​::DoubleDouble representation\, I opted to have the first character be the leading 0 or 1 .... which leaves 105 bits .... which needs 27 hex characters\, the last of which can only be either 8 or 0 (as the last 3 bits are always zero). I did that to retain some correlation between the representation of the value\, and the actual hex-encoding of the double-double. And then\, as it turns out\, C's "%La" does exactly the same formatting\, which is quite fortuitous ... hell\, I didn't even know C was capable of hex formatting of double-doubles until just now !)

Another value I looked at was 193e-3. In this case the 2 doubles are 3fc8b4395810624e and bc56872b020c49ba - the first of which is a positive value; the second being *negative*. Therefore the actual value of the double-double is going to be less than the value of the most significant double. However\, 'printf "%A"\, 193e-3;' outputs​: 0XB.4395810624E872B020C49BAP-4

Again\, the prefix looks wrong - most siginificant 13 bits are 1100010110100. Also\, if the most significant double ends in "4395810624E" we would expect that \, following the subtraction\, we would see "4395810624D" (or less)\, but we still see "4395810624E" in there.

Data​::Float​::DoubleDouble says +0x1.8b4395810624dd2f1a9fbe76c8cp-3 (and I'll have to investigate how the final hex char came to be something other than "8" or "0" ;-)

I also looked at 2 ** 200. That came out as 0X0P+0. I'm guessing it has looked at the mantissa\, seen only zeroes \, forgotten about the implied leading "1"\, and decided the value was zero.

The fourth value I looked at was 2 ** 0.5. As with 193e-3\, the least significant double is negative - which again seems to have been overlooked. The 2 doubles are 3ff6a09e667f3bcd and bc9bdd3413b26456\, and 'printf "%A"\, 2 ** 0.5;' outputs​: 0XA.09E667F3BCDDD3413B26456P-1 Correct value is 0x1.6a09e667f3bcc908b2fb1366ea8p0

The actual script I ran is attached (try.pl)\, but to run it you'll need to be on a machine whose long double is double-double\, and whose perl was built with -Duselongdouble. Also attached is the output of the script (out.txt).

Btw\, I've just checked that the above Data​::Float​::DoubleDouble values agree with C's "%La" output\, and they do - except for the final "c" in the second example (which should be 8 ... and I'll have to work out how that 107th bit got set.)

Thanks for taking this on\, Jarrko. Apologies that I haven't come up with something more constructive than "this is wrong and that aint right".

Cheers\, Rob

p5pRT commented 9 years ago

From @sisyphus

1000010111110000010001101000001010010011111100001110101101001110001001011011 10111111100000000000000000000000 The 2 doubles (most siginificant first)​: (+) 0210be08d0527e1d\, (+) 0000000069c4b77f 0210be08d0527e1d0000000069c4b77f 0XB.E08D0527E1D000069C4B77FP-991 +0x1.0be08d0527e1d69c4b77f000000p-990

1100010110100001110010101100000010000011000100100110111010010111100011010100 11111101111100111011011001000110 The 2 doubles (most siginificant first)​: (+) 3fc8b4395810624e\, (-) bc56872b020c49ba 3fc8b4395810624ebc56872b020c49ba 0XB.4395810624E872B020C49BAP-4 +0x1.8b4395810624dd2f1a9fbe76c8cp-3

1000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000 The 2 doubles (most siginificant first)​: (+) 4c70000000000000\, (+) 0000000000000000 4c700000000000000000000000000000 0X0P+0 +0x1.000000000000000000000000000p200

1011010100000100111100110011001111111001110111100110010010000100010110010111 11011000100110110011011101010100 The 2 doubles (most siginificant first)​: (+) 3ff6a09e667f3bcd\, (-) bc9bdd3413b26456 3ff6a09e667f3bcdbc9bdd3413b26456 0XA.09E667F3BCDDD3413B26456P-1 +0x1.6a09e667f3bcc908b2fb1366ea8p0

p5pRT commented 9 years ago

From @sisyphus

try.pl

p5pRT commented 9 years ago

From @sisyphus

-----Original Message----- From​: sisyphus1@​optusnet.com.au Sent​: Sunday\, August 17\, 2014 8​:40 PM

Another value I looked at was 193e-3. [snip] Data​::Float​::DoubleDouble says +0x1.8b4395810624dd2f1a9fbe76c8cp-3 (and I'll have to investigate how the final hex char came to be something other than "8" or "0" ;-)

I don't think this is central to this thread.

The setting of the last hex char to "c" arises from the (known) perl bug where the value that perl assigns to some NVs is off by one or more ULPs.

As regards 193e-3\, instead of assigning correct doubles (3fc8b4395810624e and bc56872b020c49bc)\, perl has assigned bc56872b020c49ba as the least significant double. This actually means that perl has assigned an illegitimate value to the double-double. I think 3fc8b4395810624ebc56872b020c49ba is not a valid double-double representation - and this is what throws out the calculations performed by D​::F​::DD.

We can force perl to assign the correct double-double representation (and this is the only way of doing it that I know of) by doing​:

use Math​::NV qw(​:all); $nv = nv('193e-3');

If we do that then the correct representation of 3fc8b4395810624ebc56872b020c49bc gets assigned to $nv\, and D​:F​::DD then provides correct results.

I suppose D​:F​:DD could strive to detect and correct perl's mistakes\, but that is not a high priority for me.

Cheers\, Rob

p5pRT commented 9 years ago

From @jhi

On Sunday-201408-17\, 6​:40\, sisyphus1@​optusnet.com.au wrote​:

At least that one works correctly for me on (debian wheezy) powerpc64 perl-5.21.3\, built from yesterday's git with -Duselongdouble (double-double).

Here's some values that don't look right\, however​:

For 1e-298\, the 2 doubles (most significant first) are 0210be08d0527e1d and 0000000069c4b77f\, both of which are positive values

The currently-in-blead version is all sorts of wrong for IEEE 754 128 long doubles\, and for double-doubles\, sorry about that. I'm trying to stop breaking things\, with help from Craig.

p5pRT commented 9 years ago

From @jhi

Thanks for taking this on\, Jarrko. Apologies that I haven't come up with something more constructive than "this is wrong and that aint right".

Get thee the http​://perl5.git.perl.org/perl.git and retry.

It's probably still quite wrong for double-doubles\, but at least it should be less wrong.

p5pRT commented 9 years ago

From @sisyphus

-----Original Message----- From​: Jarkko Hietaniemi

It's probably still quite wrong for double-doubles\, but at least it should be less wrong.

The value expressed for 2 ** 200 is a big improver ;-) It's now at 0X01P199 (which is off by a power of 2).

Of the other values I looked at last night\, they seem to have changed only in the leading digits. What was "0X0A.BCDEF..." has been transformed into "0X01.ABCDEF ..."\, though the correct form begins "0X01.HABCDEF... " (where H stands for some hex digit).

For example\, yesterday's blead presented 1e-298 as​: 0XB.E08D0527E1D000069C4B77FP-991

Today's blead presents it as​: 0X1.BE08D0527E1D000069C4B77FP-991

And the correct rendition is​: 0X1.0BE08D0527E1D69C4B77FP-990

Even for an easily representable float such as 128.625 (where the entire value is held in the most siginificant double and the least significant double is 0)\, today's blead presents it as 0X1.14P+6\, but correct rendition is 0X1.014P+7.

Anyway - good luck with it. (It would be nice to see this up and running with double-doubles\, but it's not something that I'm reliant upon.)

Is it not possible for you to achieve the desired result via C's %La/%LA formatting ?

Cheers\, Rob

p5pRT commented 9 years ago

From @jhi

The value expressed for 2 ** 200 is a big improver ;-) It's now at 0X01P199 (which is off by a power of 2).

Of the other values I looked at last night\, they seem to have changed only in the leading digits. What was "0X0A.BCDEF..." has been transformed into "0X01.ABCDEF ..."\, though the correct form begins "0X01.HABCDEF... " (where H stands for some hex digit).

If you could do​:

grep longdblkind config.sh

I'll also email you a test code\, the output of which would be of interest.

For example\, yesterday's blead presented 1e-298 as​: 0XB.E08D0527E1D000069C4B77FP-991

Today's blead presents it as​: 0X1.BE08D0527E1D000069C4B77FP-991

And the correct rendition is​: 0X1.0BE08D0527E1D69C4B77FP-990

Even for an easily representable float such as 128.625 (where the entire value is held in the most siginificant double and the least significant double is 0)\, today's blead presents it as 0X1.14P+6\, but correct rendition is 0X1.014P+7.

Anyway - good luck with it. (It would be nice to see this up and running with double-doubles\, but it's not something that I'm reliant upon.)

Is it not possible for you to achieve the desired result via C's %La/%LA formatting ?

That would leave us dependent on the vendors' implementations of C99. Two problems with this​:

(1) C99 - which we do not require\, and enabling of which requires often various contortions while compiling​: different cc wrapper\, different flags\, different libraries.

(2) there's wiggle room in the spec\, which inevitably leads into diverging implementations. One example of wiggle room is whether to print the trailing zero nybbles. Another is the choice of lead xdigit/exponent alignment. Another huge one is what the heck to do with the long doubles... at least with our own implementation we get to do our own mistakes. (Cue in http​://xkcd.com/927/)

Cheers\, Rob

p5pRT commented 9 years ago

From @jhi

This is way\, way implemented already.

p5pRT commented 9 years ago

@jhi - Status changed from 'open' to 'resolved'