Weak precision performance for floating point values with non-negligible negative exponents

steven-buytaert commented 2 years ago

Hi, I notice that the autotest floating point testing is biased towards large values. I tried the following example, against the libc printf.

#include <stdio.h>

// #define PRINTF_ALIAS_STANDARD_FUNCTION_NAMES  0 in following file.
#include "printf_config.h"
#include "../src/printf/printf.h"

void putchar_(char character) { }

int main(int argc, char * argv[]) {

  double fp = 1.380651569e-23; // Boltzmann

  char b[128];

  sprintf (b, "%.16e", fp); printf("libc %s\n", b);
  sprintf_(b, "%.16e", fp); printf("this %s\n", b);

}

Output: libc 1.3806515690000000e-23 this 1.3806515648376314e-23

As you see there's a serious precision issue due too arithmetic underflow. Depending on the application, this might be problematic or not. In any case, it's something you should check also in the test framework.

If floating point precision is/becomes an issue, I have a standalone implementation of ecvt that uses (very wide) integer multiplication/division to format floating point with the proper precision based upon GRISU; as such, the formatting code doesn't do any floating point arithmetic itself to format the floats. The total text size for ARM Thumb code with -Os is 1979 bytes, only depending on strcpy from libc. A good reference to this floating point formatting can be found here. This code produces the exact same bit representation of the boltzmann constant as GNU libc. It can also format 32 bit IEEE 754 directly, without going via a 64 bit IEEE 754 format. You can find this ecvt and the required wide integer arithmetic code here.

Kind regards,

Steven

jrahlf commented 2 years ago

Interesting find. There is also ryu. I think both yours and ryu do not use any double arithmetic, so it also interesting from perspective of PR #84. However, it undermines the single header approach.

eyalroz commented 2 years ago

@steven-buytaert : Thank you for noticing this.

I will look into it later this week, and try to figure out whether this is a minor bug/oversight somewhere or something more fundamental. Can you list the exact CMake option settings you were using?

In any case, it's something you should check also in the test framework.

Indeed.

steven-buytaert commented 2 years ago

@eyalroz

I did a normal build, as follows:

$ mkdir build;  cd build/
$ cmake -DBUILD_TESTS=ON -DBUILD_STATIC_LIBRARY=ON ..
$ # Changed PRINTF_ALIAS_STANDARD_FUNCTION_NAMES to 0 in the print_config.h header file.
$ make

And build my mentioned small sample program.

I think the issue is fundamental, i.e. underflow happening during the double precision operations. You can increase the precision requirement in the print_config.h file, but the fundamental issue remains.

I was triggered by the fact that the autotest program only produced positive exponents, never negative.

I was interested in this embedded printf, just because I wrote one myself.

Kind regards,

Steven

eyalroz commented 2 years ago

I think the issue is fundamental, i.e. underflow happening during the double precision operations.

I meant fundamental in the sense of can't easily be worked around (we already have some workarounds for precision issues in there - playing with the exponent, keeping track of remainders etc.)

eyalroz commented 2 years ago

Ok, here are the results of investigation...

the problem (not a bug) occurs in this piece of code:

  double_with_bit_access conv = get_bit_access(abs_number);
  // based on the algorithm by David Gay (https://www.ampl.com/netlib/fp/dtoa.c)
  int exp2 = get_exp2(conv);
      // drop the exponent, so conv.F comes into the range [1,2)
  conv.U = (conv.U & (( (double_uint_t)(1) << DOUBLE_STORED_MANTISSA_BITS) - 1U)) | ((double_uint_t) DOUBLE_BASE_EXPONENT << DOUBLE_STORED_MANTISSA_BITS);
  // now approximate log10 from the log2 integer part and an expansion of ln around 1.5
  exp10 = (int)(0.1760912590558 + exp2 * 0.301029995663981 + (conv.F - 1.5) * 0.289529654602168);
  // now we want to compute 10^exp10 but we want to be sure it won't overflow
  exp2 = (int)(exp10 * 3.321928094887362 + 0.5);
  const double z  = exp10 * 2.302585092994046 - exp2 * 0.6931471805599453;
  const double z2 = z * z;
  conv.U = ((double_uint_t)(exp2) + DOUBLE_BASE_EXPONENT) << DOUBLE_STORED_MANTISSA_BITS;
  // compute exp(z) using continued fractions, see https://en.wikipedia.org/wiki/Exponential_function#Continued_fractions_for_ex
  conv.F *= 1 + 2 * z / (2 - z + (z2 / (6 + (z2 / (10 + z2 / 14)))));
  // correct for rounding errors
  if (abs_number < conv.F) {
    exp10--;
    conv.F /= 10;
  }
}
abs_exp10_covered_by_powers_table = PRINTF_ABS(exp10) < PRINTF_MAX_PRECOMPUTED_POWER_OF_10;
normalization.raw_factor = abs_exp10_covered_by_powers_table ? powers_of_10[PRINTF_ABS(exp10)] : conv.F;

And the two numbers 1.380651569e-23 and 1.380651569e+23 take different paths here: Both of them have an exponent beyond what's covered by the precomputed table (PRINTF_MAX_PRECOMPUTED_POWER_OF_10), but for one of them, we calculate exp10 to be -22, and apply conv.F /= 10, while for the other we get exp10 as 23, so we don't need to change conv.F. The division applied to conv.F hurts accuracy - changes the result enough that we get different low digits.

Now we need to think about what to do about this.

Some ideas:

Increase the powers_of_10 table <- doesn't scale... just use 1e-50 or something.
Get exp10 more "correct" for negative values (rounding perhaps? But would that even make sense? I wonder)
not correct conv.F, but rather "remember" the extra factor for later.

Thoughts?

eyalroz commented 2 years ago

More info: The linear approximation of exp10 for the e-23 gives ~ -22.83 , and for the e+23 gives ~ 23.15. So... maybe it's the (int) conversion that's the problem. Hmmm...

eyalroz commented 2 years ago

@steven-buytaert : With the recent commit, your program now yields:

libc 1.3806515690000000e-23
this 1.3806515689994298e-23

i.e. about 12 digits of precision. (Although it exposes an unrelated issue.)

What do you think?

steven-buytaert commented 2 years ago

@eyalroz That is already much closer to the real Boltzmann constant, so it is an improvement, yes. With some more rounding, you could make that 8999 into 9000 but the point is that you don't know where to start and stop rounding. The fundamental issue is that the floating point operations you use, are but an approximation. Operations that yield bits beyond the mantissabits loose precision.

I decided therefore to not use floating point operations in my ecvt routine.

It drags in an unknown number of floating point code.
Floating point operations are always an approximation. That is why I decided to go for integer multiplication and division for the formatting as you can make these as wide as the precision you want to achieve.

As I said, it depends what you want to achieve. If you really need the precision to be 17 digits, floating operations are not good, unless you could use a higher precision floating point the IEEE754 128 or 256 bit variants. Of course, to format these themselves, you would still need to fall back to even wider integer arithmetic, as floating point again would be too limiting. In short, for formatting floating points with enough precision, you always need a type wider than the one you want to format.

Don't worry too much. Floating point formatting is still a major minefield and black art, as you undoubtedly have found out. :-)

BTW When I look at that code of David Gay, he also seems to be using a much wider floating point type when USE_BF96 is defined, exactly to achieve the proper precision to format 64 bit doubles. When this is enabled, there is a large table of powers of ten that is included, inflating the code size. In my code, I have a very small table, but I do several exact scale up or scale down operations, to bring the mantissa and the exponent in the proper range.

Regards,

Steven

eyalroz / printf

Weak precision performance for floating point values with non-negligible negative exponents #109