abolz / Drachennest

Different algorithms for converting binary to decimal floating-point numbers
Boost Software License 1.0
116 stars 7 forks source link

Update Schubfach papers #5

Closed FrankHB closed 2 years ago

FrankHB commented 2 years ago

There are some new versions:

https://drive.google.com/file/d/1JoQBN5igZ8bMI3ua7l5DrHx_GYsjF0Dy

https://drive.google.com/file/d/1IEeATSVnEE6TkrHlCYNY2GjaraBjOT4f

https://github.com/abolz/Drachennest/blob/e6714a39ad331b4489d0b6aaf3968635bff4eb5e/src/schubfach_64.cc#L912

BTW, it may be worth nothing the reason of using different h to the paper (9.10 (10) in v2).

abolz commented 2 years ago

Thanks! I didn't know that a new version of the paper is available.

BTW, it may be worth nothing the reason of using different h to the paper (9.10 (10) in v2).

You're right. I should have documented this. I'll try to add a comment later this weekend. For now: The different h is the result of an experiment to reduce the number of operations required in (9.9) (in v2) by using slightly more efficient 64-bit operations. I even don't remember whether the experiment was successful and whether the current code is actually (much) faster... I'll try to get some time and look into this in some more detail this weekend.

Thanks for the pointers!

abolz commented 2 years ago

I finally found some time to look at this again. I didn't find my old notes and must admit that it took a while to figure out what's going on here :/ Turns out it is simply because of using some more bits in the constants and working with 64- or 128-bit unsigned integers. So here are some notes. (I'm referring to Schubfach4 here.)

  1. The constants in the implementation use 128 bits. The paper requires 126 bits. I have simply tried to reuse the constants from other algorithms here. (It didn't turn out well... all implementations use slighlty different constants.)

  2. With 64- and 128-bit arithmetic at hand it is actually slightly more beneficial to work with a fixed-width fractional part of 129 bits instead of 128 bits.

First of all this requires to use the different h, as you have observed:

h = 129 + (q - 2) + floor(log2 10^-k) - 127 + 1 = q + floor(log2 10^-k) + 1

Next, in order to compute r_o'(x) (Result 23) we need 4 V' and 2 V'. The expressions for these values are then (using the different h) given by (Result 25)

4 V' = (c' g) 2^-128 = b 2^-128
2 V' = (c' g) 2^-129 = b 2^-129

where

b = c' g = c' (g1 2^64 + g0)

Now with 64-/128-bit arithmetic/constants this may be expressed as

b = A1 2^128 + A0 2^64 + B1 2^64 + B0

And this part is (even more) confusing. I'm sorry. I have mixed up the notation in the paper here. Forget the notation in the paper for a while. The code in the implementation computes the most significant 128-bits of b as

b[128...) = y1 2^64 + y0,   where y0, y1 < 2^64

Note that

floor(4 V') = y1

is readily available. Ok... following p. 22 in the paper, it turns out that

2 V' - floor(2 V') < eps = 2^-64    <==>    b[65...129) == 0

Now in the implementation of RoundToOdd we then need to return

floor(4 V')     = y1       iff   b[65...129) == 0
floor(4 V') | 1 = y1 | 1   iff   b[65...129) != 0

The only thing we may need to do after the multiplication is to set bit 0 of y1 to 1. But testing bit 128 of b (which in the code is bit 0 of y1) is not really required. If it is 1, we have b[65...129) != 0 and we would have to set it, but, well, it is already set. So we only need to check whether b[59...128) is non-zero and in this case return y1 | 1. This is exactly the test in the implementation: b[59...128) != 0 is equivalent to y0 > 1.

I hope this makes some more sense now. If you have any further questions, feel free to ask them here.

BTW: If you don't mind, I would like to rename this issue to something like "Differences from the paper".

EDIT: Added some floors