go-spatial / geom

Geometry interfaces to help drive interoperability within the Go geospatial community
MIT License
168 stars 37 forks source link

cmp: epsilon not suitble for large numbers #73

Closed ear7h closed 5 years ago

ear7h commented 5 years ago

The issue

The core of the cmp package is the following lines

https://github.com/go-spatial/geom/blob/c4fa59f5d35660f27d6b71e0ae25eac1816f9dc7/cmp/cmp.go#L62

Where tolerance is generally the package constant TOLERANCE. This approach eases some of the problems of floats, but it breaks if numbers are too large or too small. For example, I am currently working with points in the Natural Earth dataset, the the following comparison fails despite being accurate to 10 significant digits.

cmp.Float(
    -7.784854276e+06,
    -7.78485427595008e+06) // false

proposed solution

The following page provides more details and a solution:

https://floating-point-gui.de/errors/comparison/

There is an alternative to heaping conceptual complexity onto such an apparently simple task: instead of comparing a and b as real numbers, we can think about them as discrete steps and define the error margin as the maximum number of possible floating-point values between the two values.

This is conceptually very clear and easy and has the advantage of implicitly scaling the relative error margin with the magnitude of the values. Technically, it’s a bit more complex, but not as much as you might think, because IEEE 754 floats are designed to maintain their order when their bit patterns are interpreted as integers.

In code, this would be something like:

        // comparing against zero needs extra complexity bc theres
        // the just as many bitpatterns between [0, 1] as [1, inf)
        // and the count method breaks down. So, we turn it into a comparison
        // in the [1, inf) range.
        if f1 == 0 {
                f1 = 1
                f2 = math.Abs(f2) + 1
        }
        if f2 == 0 {
                f1 = math.Abs(f1) + 1
                f2 = 1
        }

        i1 := *(*int64)(unsafe.Pointer(&f1))
        i2 := *(*int64)(unsafe.Pointer(&f2))
        d := i2 - i1
        if d < 0 {
                return d > -tolerance
        } else {
                return d < tolerance
        }

The new tolerance constant could be calculated as the number of steps between 1.000001 and 1.0, which is 4503599627 (see this play). This would make the comparison accurate to 5 decimal places.

ear7h commented 5 years ago

The article I linked referenced another article that is more detailed: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

Also, I have changed the snippet to remove a conditional branch and improve its readability. One issue is that one of epsilon and tolerance need to be derived from the other.

        // The compiler is able to optimize away logic for short-circuiting
        // because the expression does not have any side-effects
        if f1 == 0 || f2 == 0 {
                return math.Abs(f2 - f1) < epsilon
        }

        i1 := int64(math.Float64bits(f1))
        i2 := int64(math.Float64bits(f2))
        d := i2 - i1
        if d < 0 {
                return d > -tolerance
        } else {
                return d < tolerance
        }
ear7h commented 5 years ago

Running a quick benchmark gave me the following results:

name        old time/op    new time/op    delta
CmpFloat-4    4.89ms ± 1%    5.23ms ± 1%  +6.88%  (p=0.000 n=8+8)

name        old alloc/op   new alloc/op   delta
CmpFloat-4     66.7B ± 1%     71.0B ± 0%  +6.42%  (p=0.001 n=7+7)

name        old allocs/op  new allocs/op  delta
CmpFloat-4      0.00           0.00         ~     (all equal)
ARolek commented 5 years ago

@ear7h can this be closed now?