Possible precision loss in sqrt

StephanEggermont commented 5 years ago

On the squeak dev mailing list, Nicolas Cellier posted some code with remarks about losing precision and a fix.

Check if this is also the case in Pharo, and if this might explain some unexpected results

The Trunk: Kernel-nice.1230.mcz commits@source.squeak.org 12 May 2019 at 21:56 Nicolas Cellier uploaded a new version of Kernel to project The Trunk: http://source.squeak.org/trunk/Kernel-nice.1230.mcz ==================== Summary ====================

Name: Kernel-nice.1230 Author: nice Time: 12 May 2019, 9:52:59.138604 pm UUID: c5ccb933-92e3-4969-9bb4-05b6794c16bf Ancestors: Kernel-nice.1229

Name: Kernel-nice.1230 Author: nice Time: 12 May 2019, 2:13:29.878804 am UUID: 56e8a3d6-3899-47b1-b828-cf0525f7dec3 Ancestors: Kernel-nice.1229

Revamp sqrt

1) move falback code for missing primitive 55 or 555 in Float and invoke super if primitive fails 2) handle inf and Nan in this fallback code 3) scale to avoid dramatic loss of precision of original Newton-Raphson loop in case of gradual underflow 4) round the result correctly as mandated by IEEE-754 standard. Since Newton-Raphson converges quadratically, it's at most one more iteration than before, plus a somehow germane adjustment... Don't be spare of comments - I re-invented the wheel and this is documented nowhere else. There are many other ways to evaluate a correctly rounded sqrt, but they are all tricky. Another way would be to rely on LargePositiveInteger sqrtFloor, but it won't be possible anymore, because SmallInteger sqrtFloor will now depend onFloat sqrt, and LargePositiveInteger depends on SmallInteger sqrtFloor (by virtue of divide and conquer) - see below.

5) move sqrt, sqrtFloor and sqrtRem implementation in Integer subclasses, so that each one takes its own responsibilities which are not exactly the same...

6) Accelerate SmallInteger sqrtFloor by invoking Float sqrt (the case of absent primitives apart) Provide long explanation telling how it works in 64bits image despite inexact asFloat 7) Accelerate SmallInteger sqrt by not testing the isFinite etc... Provide long explanation why sqrt truncated works in 64bits image despite inexact asFloat

Note that 6) and 7) are dead simple versus former super version, only the comment is not...

8) Accelerate LargePositiveInteger sqrtRem and sqrtFloor by removing the harcoded threshold: divide & conquer was faster than Newton-Raphson unconditionnally and asFloat truncated is not a viable option for Large 9) remove sqrtFloorNewtonRaphson now that neither Small nor Large integers use it. It's a heartbreaking change, I had super-optimized it along the years...

Note: I could have kept a Newton-Raphson for large, accelerated by seeding the first guess with asFloat sqrt truncated. Up to 106 bits, the single correction of SmallInteger>>sqrtFloor might also work.

10) move the handling of Float overlow/accuracy in LargePositiveInteger sqrt where they belong. It's not about speed-up, it's about putting things at their place.

11) Speed up SmallInteger isAnExactFloat. It's not sqrt related, but I first thought using it before engaging asFloat sqrt truncated, that's the reason why you find it here

Congratulations, you reached the end of this commit message!

=============== Diff against Kernel-nice.1229 ===============

Item was changed: ----- Method: BoxedFloat64>>sqrt (in category 'mathematical functions') ----- sqrt

"Answer the square root of the receiver. Optional. See Object documentation whatIsAPrimitive."

| exp guess eps delta |

<primitive: 55>

^super sqrt!
Numeric.
"Changed 200/01/19 For ANSI support."
"Newton-Raphson"
self <= 0.0
ifTrue: [self = 0.0
ifTrue: [^ 0.0]
ifFalse: ["v Chg"
^ DomainError signal: 'sqrt undefined for number less than zero.']].
"first guess is half the exponent"
exp := self exponent // 2.
guess := self timesTwoPower: 0 - exp.
"get eps value"
eps := guess * Epsilon.
eps := eps * eps.
delta := self - (guess guess) / (guess 2.0).
[delta * delta > eps]
whileTrue:
[guess := guess + delta.
delta := self - (guess guess) / (guess 2.0)].
^ guess!

Item was added:

----- Method: Float>>sqrt (in category 'mathematical functions') -----
sqrt
"Fallback code for absent primitives.
Care to answer a correctly rounded result as mandated by IEEE-754."
| guess selfScaled nextGuess exp secator hi lo remainder maxError |
self <= 0.0
ifTrue: [self = 0.0
ifTrue: [^ 0.0]
ifFalse: [^ DomainError signal: 'sqrt undefined for number less than zero.']].
self isFinite ifFalse: [^self].
"scale to avoid loss of precision in case of gradual underflow
(selfScaled between: 1.0 and: 2.0), so it is a good guess by itself"
exp := self exponent // 2.
guess := selfScaled := self timesTwoPower: exp * -2.
"Use Newton-Raphson iteration - it converges quadratically
(twice many correct bits at each loop)"
[nextGuess := selfScaled - (guess guess) / (2.0 guess) + guess.
"test if both guess are within 1 ulp"
(nextGuess + guess) / 2.0 = guess]
whileFalse:
["always round odd upper - this avoids infinite loop with alternate flip of last bit"
guess := nextGuess + (nextGuess ulp/2.0)].
"adjust the rounding - the guess can be 2 ulp up or 1 ulp down
Let u = guess ulp.
if (guess+u/2)^2<self, then guess is under-estimated
if (guess-u/2)^2>self, then guess is over-estimated
Note that they can't be equal (because left term has 55 bits).
(guess+u/2)^2=guess^2 + guess*u + u^2/4 < self
==> self - guess^2 > guess*u
(guess-u/2)^2=guess^2 - guess*u + u^2/4 > self
==> guess^2 - self >= guess*u
(guess^2 - self) is evaluated with an emulated fused-multiply-add"
["Decompose guess in two 26 bits parts hi,lo
the trick is that they do not necessarily have the same sign
If 53 bits are hi,0,lo => (hi,lo) else hi,1,lo=> (hi+1,-lo)"
secator := "1<<27+1" 134217729.0.
hi := guess * secator.
hi :=hi + (guess - hi).
lo := guess - hi.
"The operations below are all exact"
remainder := selfScaled - hi squared - (hi lo 2.0) - lo squared.
maxError := guess timesTwoPower: 1 - Float precision.
remainder > maxError or: [remainder negated >= maxError]]
whileTrue: [guess :=remainder > 0.0
ifTrue: [guess successor]
ifFalse: [guess predecessor]].
"undo the scaling"
^ guess timesTwoPower: exp!

Item was changed: ----- Method: Integer>>slidingLeftRightRaisedTo:modulo: (in category 'private') ----- slidingLeftRightRaisedTo: n modulo: m "Private - compute (self raisedTo: n) \ m, Note: this method has to be fast because it is generally used with large integers in cryptography. It thus operate on exponent bits from left to right by packets with a sliding window rather than bit by bit (see below)."

| pow j k w index oddPowersOfSelf square |

"Precompute powers of self for odd bit patterns xxxx1 up to length w + 1. The width w is chosen with respect to the total bit length of n, such that each bit pattern will on average be encoutered P times in the whole bit sequence of n. This costs (2 raisedTo: w) multiplications, but more will be saved later (see below)." k := n highBit.

w := (k highBit - 1 >> 1 min: 16) max: 1. oddPowersOfSelf := Array new: 1 << w. oddPowersOfSelf at: 1 put: (pow := self). square := self self \ m. 2 to: oddPowersOfSelf size do: [:i | pow := oddPowersOfSelf at: i put: pow square \ m].

"Now exponentiate by searching precomputed bit patterns with a sliding window" pow := 1.

[k > 0]

whileTrue:

[pow := pow * pow \ m.

"Skip bits set to zero (the sliding window)" (n bitAt: k) = 0

ifFalse:

["Find longest odd bit pattern up to window length (w + 1)" j := k - w max: 1.

[j < k and: [(n bitAt: j) = 0]] whileTrue: [j := j + 1]. "We found an odd bit pattern of length k-j+1; perform the square powers for each bit (same cost as bitwise algorithm); compute the index of this bit pattern in the precomputed powers." index := 0.

[k > j] whileTrue:

[pow := pow * pow \ m.

index := index * 2 + (n bitAt: k).
index := index << 1 + (n bitAt: k). k := k - 1].

"Perform a single multiplication for the whole bit pattern. This saves up to (k-j) multiplications versus a naive algorithm operating bit by bit" pow := pow * (oddPowersOfSelf at: index + 1) \ m]. k := k - 1].

^pow!

Item was removed:

----- Method: Integer>>sqrt (in category 'mathematical functions') -----
sqrt
"Answer the square root of the receiver."
| selfAsFloat floatResult guess |
selfAsFloat := self asFloat.
floatResult := selfAsFloat sqrt.
floatResult isInfinite ifFalse: [
guess := floatResult truncated.
"If got an exact answer, answer it. Otherwise answer float approximate answer."
guess squared = self
ifTrue: [ ^ guess ]].
"In this case, maybe it failed because we are such a big integer that the Float method becomes
inexact, even if we are a whole square number. So, try the slower but more general method"
selfAsFloat >= Float maxExactInteger asFloat squared
ifTrue: [
guess := self sqrtFloor.
guess squared = self ifTrue: [
^guess ].
"Nothing else can be done. No exact answer means answer must be a Float.
Answer the best we have which is the rounded sqrt."
guess := (self * 4) sqrtFloor.
^(guess // 2 + (guess \ 2)) asFloat].
"We need an approximate result"
^floatResult!

Item was changed: ----- Method: Integer>>sqrtFloor (in category 'mathematical functions') ----- sqrtFloor

"Return the integer part of the square root of self Assume self >= 0

The following post-conditions apply:
The following invariants apply: 1) self sqrtFloor squared <= self 2) (self sqrtFloor + 1) squared > self"
self subclassResponsibility!
^self sqrtFloorNewtonRaphson!

Item was removed:

----- Method: Integer>>sqrtFloorNewtonRaphson (in category 'private') -----
sqrtFloorNewtonRaphson
"Return the integer part of the square root of self.
Use a Newton-Raphson iteration since it converges quadratically
(twice more bits of precision at each iteration)"
| guess delta |
guess := 1 bitShift: self highBit + 1 // 2.
[
delta := guess squared - self // (guess bitShift: 1).
delta = 0 ] whileFalse: [
guess := guess - delta ].
^guess - 1!

Item was changed: ----- Method: Integer>>sqrtRem (in category 'mathematical functions') ----- sqrtRem

"Return an array with floor sqrt and sqrt remainder. Assume self >= 0.

The following invariants apply: 1) self sqrtRem first squared <= self 2) (self sqrtRem first + 1) squared > self 3) self sqrtRem first squared + self sqrtRem last = self"

self subclassResponsibility!
| s |
s := self sqrtFloorNewtonRaphson.
^{ s. self - s squared } !

Item was changed: ----- Method: LargePositiveInteger>>sqrt (in category 'mathematical functions') ----- sqrt

"Answer the square root of the receiver.
If the square root is exact, answer an Integer, else answer a Float approximation.
Handle some tricky case of Float overflow or inaccuracy"
| selfAsFloat floatResult integerResult |
selfAsFloat := self asFloat.
floatResult := self asFloat sqrt.
floatResult isFinite
ifTrue:
[self mightBeASquare ifFalse: ["we know for sure no exact solution exists,
just answer the cheap float approximation without wasting time."
^floatResult].
"Answer integerResult in case of perfect square"
integerResult := floatResult truncated.
integerResult squared = self ifTrue: [^integerResult]].
"In this case, maybe it failed because we are such a big integer that the Float method becomes
inexact, even if we are a whole square number. So, try the slower but more general method"
selfAsFloat >= Float maxExactInteger asFloat squared
ifTrue: [
integerResult := self sqrtFloor.
integerResult squared = self ifTrue: [
^integerResult ].
"Nothing else can be done. No exact answer means answer must be a Float.
Answer the best we have which is the rounded sqrt."
integerResult := (self bitShift: 2) sqrtFloor.
^((integerResult bitShift: -1) + (self bitAnd: 1)) asFloat].
"If we know for sure no exact solution exists, then just answer the cheap float approximation without wasting time."
| selfAsFloat |
self mightBeASquare
ifFalse:
[selfAsFloat := self asFloat.
selfAsFloat isFinite ifTrue: [^self asFloat sqrt ]].
"We need an approximate result"
^floatResult!
"If some exact solution might exist, or self asFloat isInfinite, call potentially expensive super"
^super sqrt!

Item was changed: ----- Method: LargePositiveInteger>>sqrtFloor (in category 'mathematical functions') ----- sqrtFloor

"See super. Use a fast divide and conquer strategy."
"Like super, but use a faster divide and conquer strategy if it's worth"
self digitLength >= 64 ifFalse: [^self sqrtFloorNewtonRaphson]. ^self sqrtRem first!

Item was changed: ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') ----- sqrtRem

"See super. Use a divide and conquer method to perform this operation.
"Like super, but use a divide and conquer method to perform this operation. See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"

| n qr q s r sr a3a2 a1 a0 | "Split self in 4 digits a3,a2,a1,a0 in base b, such that most significant digit a3 >= b/4 It is not a problem to have a3 >= b, so we can round b down to a whole number of bytes n" n := self highBit bitShift: -5. "bitShift: -2 divide in 4 parts, bitShift: -3 round down in bytes"

n >= 16 ifFalse: [^super sqrtRem]. a3a2 := self butLowestNDigits: n 2. a1 := self copyDigitsFrom: n + 1 to: n 2. a0 := self lowestNDigits: n.

sr := a3a2 sqrtRem.

qr := (sr last bitShift: 8 * n) + a1 divideByInteger: (sr first bitShift: 1). q := qr first normalize.

s := (sr first bitShift: 8 n) + q. r := (qr last normalize bitShift: 8 n) + a0 - q squared. r negative

ifTrue:

[r := (s bitShift: 1) + r - 1. s := s - 1].

sr at: 1 put: s; at: 2 put: r. ^sr

!

Item was changed: ----- Method: SmallFloat64>>sqrt (in category 'mathematical functions') ----- sqrt

"Answer the square root of the receiver. Optional. See Object documentation whatIsAPrimitive."

| exp guess eps delta |

<primitive: 555>

^super sqrt!
Numeric.
"Changed 200/01/19 For ANSI support."
"Newton-Raphson"
self <= 0.0
ifTrue: [self = 0.0
ifTrue: [^ 0.0]
ifFalse: ["v Chg"
^ DomainError signal: 'sqrt undefined for number less than zero.']].
"first guess is half the exponent"
exp := self exponent // 2.
guess := self timesTwoPower: 0 - exp.
"get eps value"
eps := guess * Epsilon.
eps := eps * eps.
delta := self - (guess guess) / (guess 2.0).
[delta * delta > eps]
whileTrue:
[guess := guess + delta.
delta := self - (guess guess) / (guess 2.0)].
^ guess!

Item was added:

----- Method: SmallInteger>>isAnExactFloat (in category 'testing') -----
isAnExactFloat
"See super.
When you're small, the fastest way is to try"
^self asFloat truncated = self!

Item was changed: ----- Method: SmallInteger>>sqrt (in category 'mathematical functions') ----- sqrt

"Answer the square root of the receiver.
If the square root is exact, answer an Integer, else answer a Float approximation"
| floatResult integerResult | self negative ifTrue: [

^ DomainError signal: 'sqrt undefined for number less than zero.' ].

floatResult := self asFloat sqrt.
integerResult := floatResult truncated.
"Note: truncated works for 60-bit SmallInteger
If self is a square s^2, but asFloat rounds down,
f = s^2(1-e), f^0.5 = s(1-e)^0.5 = s(1-0.5e+O(e^2))
since s asFloat is exact, and e <= 0.5*ulp(1),
s(1-0.5e+O(e^2)) always rounds to s"
integerResult * integerResult = self ifTrue: [^integerResult].
^floatResult!
^super sqrt!

Item was added:

----- Method: SmallInteger>>sqrtFloor (in category 'mathematical functions') -----
sqrtFloor
"See super. Use asFloat sqrt which is known to be exactly rounded.
Adjust the result in case self asFloat is inexact.
An example why it is necessary with 60 bits SmallInteger is:
| i |
i := (1<<28-1) squared - 1.
i asFloat sqrt truncated squared <= i.
What happens is that i and and next perfect square above i, s^2
are rounded to the same Float f >= s^2.
In other words, asFloat did cross the next perfect square boundary.
The guess is at most off by 1, because the next next perfect square is:
(s + 1) squared = (2*s + s squared + 1)
s squared has at most 60 bits, and 2*s has 31 bits in this case,
s squared highBit - (2*s) highBit < Float precision,
so we are sure that next next perfect square is a different Float."
| guess |
guess := self asFloat sqrt truncated.
guess * guess > self ifTrue: [^guess - 1].
^guess!

Item was added:

----- Method: SmallInteger>>sqrtRem (in category 'mathematical functions') -----
sqrtRem
"See super"
| s |
s := self sqrtFloor.
^{s. self - (s*s)}!

StephanEggermont commented 5 years ago

Also see 1232 for a regression

nicolas-cellier-aka-nice commented 5 years ago

The problem of precision is only in fall-back code for the case when primitives are absent. The rest of the commit is about accelerating and cleaning. It's better if coordinated with the accelerated Large Integer Arithmetic that I recently introduced in Squeak. It tooks a few commits to get a satisfying implementation, paintings are not made of a single brush stroke.

khinsen commented 5 years ago

In Pharo, sqrt is implemented as a primitive, so I suppose it uses the processor's FP unit.

nicolas-cellier-aka-nice commented 3 years ago

Also, please use .diff url rather than .mcz, it works far better for web browsing.

https://source.squeak.org/trunk/Kernel-nice.1230.diff

PolyMathOrg / PolyMath

Possible precision loss in sqrt #114

Numeric.

Numeric.