Closed c3d closed 1 month ago
The problem does not exist with 16 digits binary, where we get a result back that is indistinguishable from [ 1 2 3 ]
, to the point where subtracting [ 1 2 3 ]
from it gives [ 0. 0. 0. ]
Conversion to spherical: the differences between 16-digits (HWFP) and 24-digit (decimal) is in the order of [ -5.558375E-17 -6.484281E-16_° 2.4527469576E-10_° ]
.
The last one is a bit suspect: 1E-10 is larger than what we would expect.
The formula for the third term on [ 1 2 3 ]
is acos(3/sqrt(1^2+2^2+3^2))
, or acos(3/sqrt(14)
. sqrt(14)^2 - 14
is in the order of 1E-22
. Note that the difference is zero in binary.
Checked sqrt(14)
at 24 and 36 digits by squaring it and changing the last digit.
In the 24 digits case, changing the last digit up by one makes the square exact, altough doing so is incorrect relativee to the square root at higher precision.
Computed result: 3.74165_73867_73941_38558_375
Rounded up result: 3.74165_73867_73941_38558_376
36-digit result: 3.74165_73867_73941_38558_37487_32316_5493
36-digit rounded up: 3.74165_73867_73941_38558_37487_32316_5494
Squares:
13.99999_99999_99999_99999_99
14.
13.99999_99999_99999_99999_99999_99999_9997
14.00000_00000_00000_00000_00000_00000_0005
In short, the (incorrectly) rounded up result happens to be better for 24 digits, but worse for 36 digits. So no obvious error in the computation of the square root.
The 3/sqrt(14)
result computed at 24 and 36 digits are very close too:
0.80178_37257_37273_15405_36604_42638_26056_5
0.80178_37257_37273_15405_366
0.80178_37257_37273_19
So if anything is wrong here, it's the FP16 case, last digit is 9 and should be 5.
The acos
is where we see the values diverge:
36.69922_52002_89878_84843_76877_42818_7651_°
36.69922_52002_44602_79413_76_°
36.69922_52004_89877_4_°
So here, 36-digit roughly get the same result as 16-digit, but completely different from 24-digit.
Computing with 64 digits gives:
36.69922_52004_89883_01023_17730_82860_8802_°
.
So there is clearly a problem with the acos
computation.
The formula used for acos is atan(sqrt(1-x^2)/x)
.
The value for the atan
argument at 36 digits is
0.74535_59924_99929_89880_30578_89577_09208_5
We are not at some extreme where we could expect to lose precision.
For this value X
, tan(atan(X))-X
is in the order of 1.12990E-16
, which is way higher than what we should get. This means either tan
or atan
is at fault here.
Reduced test case: tan(atan(0.74))-0.74
.
This currently gives:
HW7: 5.96046E-8 (expected) HW16: -1.11022_30246E-16 (expected) VP24: -3.93321_80669E-12 VP36: -5.14785_26033E-17
So I am correct in my expectations that the result can and should be in the order of the precision. We currently have about half.
Tracing shows that argument reduction seems to fail. We enter the main loop with value 0.74, which is above 0.5, so it does not converge quickly enough.
The test for values above 0.5 was tucked inside a test for values above 1.0, so values between 0.5 and 1.0 would not correclty be dealt with.
Identified while working on #1238: The conversion of
[ 1 2 3 ]
to spherical and back gives a result with only about 12 digits correct for the returning value. We get[ 0.99999_99999_94256 1.99999_99999_88513 3.00000_00000_09572 ]