Open JeffreySarnoff opened 5 years ago
See also #6148
Is there a corresponding section for constants which should defined?
There is no such section in the current standard; however some constants should be available at their most accurate values for given representations and under the various rounding mode to comply best. These come to mind (and there are others which would be advantageous):
π, sqrt(2), exp(1), log(2), log(10), log2(10), log10(2)
, and their inverses.
A finite IEEE754 floating point value is constructible using ldexp(significand, exponent)
where
significand, exponent = frexp(value)
. It would be very helpful to return the nearest binary representation to a decimal significand with a radix 10 exponent e.g. frexp(value, base=10)
.. and similarly with ldexp
.
also missing is negate(x)
negate(x) copies a floating-point operand x to a destination in the same format, reversing the sign bit. negate(x) is not the same as subtraction(0, x). [negate(NaN) forces msb, 0.0 - NaN leaves msb] -- IEEE754-2019 (also in 2008 rev, maybe in original)
Should -x call 0 - x or negate?
In what follows,
Float16
is used as an example. The same holds forFloat32
andFloat64
.
-x
where x::Float16
should be processed as negative(x)
: corrected
negate(x::Float16) = reinterpret(Float16, reinterpret(UInt16, x) | 0x8000)
changesign(x::Float16) = reinterpret(Float16, reinterpret(UInt16, x) ⊻ 0x8000)
negative(x::Float16) = isnan(x) ? x : changesign(x)
with the above this (which is important for proper propagation of NaNs) would hold
msbit(x::Float16) = reinterpret(UInt16, x) & 0x8000
isnan(x) && (msbit(x) === msbit(-x))
[it does not in Julia v1.2]. The relationship is important. The standard speaks to it:
When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however, that operations on bit strings—copy, negate, abs, copySign—specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand.
immediateNaN = NaN
arithmeticNaN = 0 / 0
signbit(immediateNaN) !== signbit(arithmeticNaN) # otherwise the bits are identical
[this signbit distinction holds in Julia 1.2]
signbit(immediateNaN) === false
signbit(arithmeticNaN) === true
with that, proper NaN propagation over arithmetic expressions needs two things
(i) NaN1 op NaN2
--> consistently either NaN1, NaN2, or the NaN of larger payload
(Julia 1.2 has NaN1 op NaN2 --> NaN1
for +, -, *, /
and others)
(ii) see following (Julia 1.2 does have this property)
msbit(x::Float16) = reinterpret(UInt16, x) & 0x8000
isnan(x) && (msbit(x) === msbit(-x))
If this hold, one may determine if a calculation which results in a value that isnan
has encountered an indeterminate expression (0.0/0.0, Inf/Inf, 0.0*Inf, Inf*0.0
and those variously signed) or has encountered a sentinel NaN
or perhaps used an uninitialized value preset to NaN
.
So we have to fix Float16
and the others are fine? I am confused that that -x
destroys
information about what went wrong during computations
julia> reinterpret(Int, (0.0/0.0)) - reinterpret(Int, NaN)
-9223372036854775808
julia> reinterpret(Int, -(0.0/0.0)) - reinterpret(Int, NaN)
0
julia> reinterpret(Int, 0-(0.0/0.0)) - reinterpret(Int, NaN)
-9223372036854775808
The use of Float16
was as an example -- Float32
and Float64
need the same treatment.
The difficulty with -x
is only solved by changing things as I describe in my previous comment.
also [Julia 1.2] does not do this, although the standard specifies the following behavior
abs(x::T) where {T<:IEEEFloat} = isnan(x) ? x : flipsign(x, x)
Along with the correction to -x
described above, to get usable behavior, this is needed too.
What happens with copysign(NaN, x)
is not specified in the standard. To provide propagated support that preserves each of these sorts of non-numbers,
immediateNaN = NaN
arithmeticNaN = 0 / 0
in addition to the handling of -x
given above
abs(x::T) where {T<:IEEEFloat} = isnan(x) ? x : flipsign(x, x)
copysign(x::T, y::T) where {T<:IEEEFloat} =
(signbit(x) === signbit(y) || isnan(x)) ? x : flipsign(x, x)
"Implementations shall provide the following non-computational operations for all supported [floating point] arithmetic formats "
another missing mandated function is class(x)
:
class(x) tells which of the following ten classes x falls into: signalingNaN quietNaN negativeInfinity negativeNormal negativeSubnormal negativeZero positiveZero positiveSubnormal positiveNormal positiveInfinity
and we need totalorder(x::T, y::T) where {T<:IEEEFloat}
totalOrder(x, y) imposes a total ordering on canonical members of the format of x and y: a) If x<y, totalOrder(x, y) is true. b) If x>y, totalOrder(x, y) is false. c) If x=y: 1) totalOrder(−0, +0) is true. 2) totalOrder(+0, −0) is false. 3) If x and y represent the same floating-point datum: i) If x and y have negative sign, totalOrder(x, y) is true if and only if the exponent of x ≥ the exponent of y ii) otherwise, totalOrder(x, y) is true if and only if the exponent of x ≤ the exponent of y. d) If x and y are unordered numerically because x or y is a NaN: 1) totalOrder(−NaN, y) is true where −NaN represents a NaN with negative sign bit and y is a floating-point number. 2) totalOrder(x, −NaN) is false where −NaN represents a NaN with negative sign bit and x is a floating-point number. 3) totalOrder(x, +NaN) is true where +NaN represents a NaN with positive sign bit and x is a floating-point number. 4) totalOrder(+NaN, y) is false where +NaN represents a NaN with positive sign bit and y is a floating-point number. 5) If x and y are both NaNs, then totalOrder reflects a total ordering based on: i) negative sign orders below positive sign ii) signaling orders below quiet for +NaN, reverse for −NaN iii) otherwise, the order of NaNs is implementation-defined. Neither signaling NaNs nor quiet NaNs signal an exception. For canonical x and y, totalOrder(x, y) and totalOrder( y, x) are both true if x and y are bitwise identical. Unsigned NaNs, as may occur in non-interchange formats, should order like NaNs with positive sign bit
5) If x and y are both NaNs, then totalOrder reflects a total ordering based on: i) negative sign orders below positive sign ii) signaling orders below quiet for +NaN, reverse for −NaN iii) otherwise, the order of NaNs is implementation-defined.
regarding (5.iii), in this case our implementation should order according to the constructed "signed payload" values
Should all of these go into Base
? Would an interface package be more appropriate for some of the functions?
regarding negate(x::AbstractFloat)
where AbstractFloat is IEEE 754-2019 conformant
section 5.5.1 Sign bit operations negate(x) copies a floating-point operand x to a destination in the same format, reversing the sign bit. negate(x) is not the same as subtraction(0, x).
note section 6.3 The sign bit
When either an input or result is a NaN, this standard does not interpret the sign of a NaN. However, operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicates totalOrder and isSignMinus are also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation.
[however, I find this to be unhelpful :)]
Should all of these go into Base? Would an interface package be more appropriate for some of the functions?
Do you want to say Julia fully embraces IEEE 754-2019? If so, then they belong in Base. otoh "Language standards should define, to be implemented according to this subclause, as many of the operations in Table 9.1 as is appropriate to the language." [see first comment for a list]. From the perspective of the Standard, the question is not where to put them .. it is whether they are to be offered as standard part of Julia.
from IEEE 754-2019: sec 9.2 Additional mathematical operations
"Language standards should define, to be implemented according to this subclause, as many of the operations in Table 9.1 as is appropriate to the language. As with other operations of this standard, the names of the operations in Table 9.1 do not necessarily correspond to the names that any particular programming language would use. In this subclause the domain of an operation is that subset of the affinely extended reals for which the corresponding mathematical function is well defined. A conforming operation shall return results correctly rounded for the applicable rounding direction for all operands in its domain."
these functions include
there are short tables of required
y = f(x) where x in {±0, ±∞}
for many of these functions