golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.69k stars 17.62k forks source link

proposal: math: add Float32SignificandBits and Float64SignificantBits #66640

Open dsnet opened 6 months ago

dsnet commented 6 months ago

Proposal Details

I propose the addition of the following constants to the math package:

const (
    // Float32SignificandBits is the number of bits in the significand
    // for a IEEE 754 binary32 value.
    // Integers within [-2<<Float32SignificandBits, +2<<Float32SignificandBits]
    // can be exactly represented within a float32.
    Float32SignificandBits = 23

    // Float64SignificandBits is the number of bits in the significand
    // for a IEEE 754 binary64 value.
    // Integers within [-2<<Float64SignificandBits, +2<<Float64SignificandBits]
    // can be exactly represented within a float64.
    Float64SignificandBits = 52
)

When inter-operating with JavaScript, it's common to clamp floating point integers to a certain range to ensure precise representation of integers. However, logic doing this often uses a hardcoded 52 or 53 constant, making it hard to discover where such logic may be occurring. By declaring this as a constant, we can check for all references to math.Float64SignificandBits within a codebase to see what logic may be concerned with integer to float64 conversions.

These constants are very IEEE 754 centered, but I argue that is okay since Float64bits and Float64frombits already exist and are IEEE 754 specific.

KreanXie commented 6 months ago

Make sense,however I feel like the scope of this constant is quite narrow, and it might only be applicable in specific cases. I think it's more suitable to define such constants within one's own package. Also, the constant name does seem a bit lengthy. Furthermore, given that math.Float64bits and math.Float64frombits already exist, further declarations like this may introduce redundancy.

jfrech commented 6 months ago

In your title, s/Float64SignificantBits/Float64SignificandBits/.

jfrech commented 6 months ago

I would interpret 2<<n as 2**(n+1) and scratch my head regarding the meaning of -2<<n.

jfrech commented 6 months ago

You can already do:

import (
    "math"
    "math/big"
)

var (
    Float32SignificandBits = big.NewFloat(math.MaxFloat32).MinPrec() - 1
    Float64SignificandBits = big.NewFloat(math.MaxFloat64).MinPrec() - 1
)
Jorropo commented 6 months ago

This is hard to parse relative to the simple result it gives and it's not const expression.

define such constants within one's own package

is way better advice imo.

Jorropo commented 6 months ago

What about this instead:

const (
    // float32SignificandBits is the number of bits in the significand
    // for a IEEE 754 binary32 value.
    // Integers within [-2<<float32SignificandBits, +2<<float32SignificandBits]
    // can be exactly represented within a float32.
    float32SignificandBits = 23

    // float64SignificandBits is the number of bits in the significand
    // for a IEEE 754 binary64 value.
    // Integers within [-2<<float64SignificandBits, +2<<float64SignificandBits]
    // can be exactly represented within a float64.
    float64SignificandBits = 52
)

const (
 // Float32SafeInteger represent the highest magnitude continuous integer based on IEEE 754, after which only less than half of the integers are available.
 Float32SafeInteger = 2<<float32SignificandBits

 // Float64SafeInteger represent the highest magnitude continuous integer based on IEEE 754, after which only less than half of the integers are available.
 Float64SafeInteger = 2<<float64SignificandBits
)

It seems to me this is the value we are after and it communicates the problem more clearly if you don't already know why javascript's integers are related to 52.

I can see this being worst than 23 and 52 if you actually want 23 and 52 because bits.Len and friends are not usable in const expressions (but the compiler fold them at compile time if inlined).

jfrech commented 6 months ago

@Jorropo Your descriptions for Float32SafeInteger and Float64SafeInteger are the same; contiguity is a property of a set of integers, not only of its maximum; and talking about "half of the integers" requires serious goodwill to correctly interpret.

dsnet commented 6 months ago

Float32SafeInteger and Float64SafeInteger generally sounds fine to me, but one downside of Float32SafeInteger is that it doesn't tell you the bit width, if you need that in your calculation (which we did in our logic). Of course, we can compensate by doing 64-bits.LeadingZeros64(x) to compute the bitwidth (but we lost the const aspect of the operation).

As a bikeshed, perhaps call it Float64PreciseInteger instead? "Safe" seems to imply that going outside this range is "unsafe" like things will panic or something.

talking about "half of the integers" requires serious goodwill to correctly interpret.

The documentation can be adjusted. The goal is to document what range of integers is precise. Other than saying that integers outside this range are imprecise, I don't think we need to document exactly what the behavior is.

dsnet commented 6 months ago

I think it's more suitable to define such constants within one's own package

The challenge with this is that I can't search across a code base for situations that might care about this type of conversion since logic does it slightly differently. Searching for the constants 52, 53, 1 << 52, 2 << 52 or 1<<53 leads to many false positives. Usually I provide data from the module proxy of how often some constant is used, but I had a hard time filtering out the noise when analyzing this. I know code cares about it, but I can't easily determine how much code.