Open LilithHafner opened 1 year ago
In defiance of "never say never", it's not a horrible bet that literally no Julia user relies on the NaN semantics of any function beyond EDIT: someone below says that payload tagging is used in isnan(f(NaN))
SentinelArrays.jl
. There would be a cost to maintaining this behavior when the basic functions do not. For how exceedingly rarely somebody cares and how relatively cheaply one can wrap a function for a specific semantic, I think we can afford to adopt any-NaN semantics (a name I made up, meaning that what payload is produced from NaN inputs is unspecified). Further, any-NaN semantics leave room to make non-breaking changes in the future if the landscape shifts such that people do actually care about payloads. It's also robust to hardware that makes unusual NaN propagation choices (I don't think IEEE754 dictates a specific semantic they must follow EDIT: see end of post).
Up to now, almost every function has been implemented with an input-NaN semantic (another name I made up, specifying that the payload of one NaN input is propagated to the output). This is also what's usually (but perhaps not always?) used by hardware-native operations. In fact, our current input-NaN semantic is usually contingent on hardware input-NaN semantics.
There is a risk that this results in re-implementations of functions that "break" existing behaviors if somebody really did rely on payload propagation. For example, I believe there is a faster min
-- but maybe not max
-- on x86 if you're willing to mangle payloads. But there was never any formal guarantee and I'm not sure that anybody ever cared.
Is the proposal to document this centrally or on a per-function basis? Per-function seems like it would be a never-finished job and add noise to docstrings, so I'd propose to only document it centrally. Functions which have notable deviations should be documented locally. For example, that hypot
is not poisoned by NaN in the presence of Inf.
EDIT: Originally, I was unsure of IEEE754's stance on payload propagation. The document linked by a later poster suggests that "The current standard specifies that if an operation has multiple NaN inputs, then the result should be one of the input NaNs. The standard does not specify which one." I assume this extends to unary functions as well.
If no users care that would make things easier. I posted on slack and discourse for higher visibility.
We could also run pkgeval on a branch that mangles all NaNs, but that seems like a lot of work.
See also this IEEE standards document for some more background. The most common extant applications of NaN payloads seems to be (a) tracking exception types and (b) tracking NA (missing) values in R, neither of which are especially critical in Julia (because we normally use exceptions and missing
values, respectively). (I could imagine some Julia application using R-style tagged-NaNs instead of Union{Missing,Float64}
for performance/memory reasons, I guess?) There is also JavaScript-style NaN boxing, which seems even less likely in Julia. The IEEE document also mentions some general issues with trying to propagate NaN payloads.
We use an R-style tagged-NaN in SentinelArrays.jl.
Specifically, this NaN
:
julia> Core.bitcast(Float64, typemax(UInt64))
NaN
because we do a memset
with 0xff
on the Vector{Float64}
to set missing
.
It seems that some people do use payload tagging in some cases.
Further,
The document linked by a later poster suggests that "The current standard specifies that if an operation has multiple NaN inputs, then the result should be one of the input NaNs. The standard does not specify which one." I assume this extends to unary functions as well.
If true, this would mean that to reject payload propagation semantics would be to violate IEEE754 semantics on any function defined therein. I'm not excited about this prospect.
Let's talk cost/benefit. Are there functions that we would implement differently with loosened NaN semantics? I mentioned a small optimization of min
on x86 (not aarch64) but it wouldn't be game-changing. Any others?
It seems that, if anything, we might have to document a general policy (although perhaps not a strict guarantee) that a NaN output resulting from one or more NaN inputs should include the payload of one of the NaN inputs. We'd have to adhere to this policy for IEEE754 functions but probably should in other cases as well.
It seems like for any function that returns NaN when one of the inputs is NaN, we can try to return the NaN that was passed in. That's how hardware float operations work, so it often happens naturally. In places where we "generate" a NaN, we should produce the "standard NaN", namely the one you get when you evaluate NaN
.
Somewhat related food for thought.
Propagation of NaN
payloads through through various functions in Base
is haphazard at best -- and I am not suggesting that it must be uniform! -- but this fact is likely (happily) overlooked by the vast majority of users and developers. Bearing in mind the limitations imposed by LLVM, it is worthwhile to question what might be done.
sin
is a simple example where we do something in Julia which mangles a payload (i.e. we return the "standard NaN").
The code below demonstrates some of the heterogeneity.
x = reinterpret(Float64, reinterpret(UInt64, NaN) | 0xff);
for f in (sin, cos, tan, acos, asin, atan, log, exp, sqrt, abs2)
println(f, "\t:\t", bitstring(f(x)))
end
If we want to follow Stefan's logic, then all occurrences which amount to isnan(x) && return NaN
must instead be isnan(x) && return x
. Easily done and without penalties, at least from a conceptual standpoint; the test suite may inadvertently rely on the extant behavior, but should not be too substantial in Base. The ecosystem at large may rely on the haphazard
NaN` behavior for testing (i.e. silencing of payloads by some functions); I suppose PkgEval to measure extent of damage.
That sin
example is a good catch. A return x
will be a bit faster since you don't have to load a new NaN
value and can just return the one you have in a register already. That said in general, I don't really want to document NaN
behavior since especially for 2 argument functions, I could see it being useful in some cases to make NaN
s with arbitrary combinations of the bits of NaN
s of the inputs.
I don't really want to document
NaN
behavior since...
I concur on leaving NaN
behavior undocumented. Strategic ambiguity, particularly in light of the uncertainty about what might become commonly adopted 10 years from now (once the dust settles around IEEE, LLVM's handling of NaNs, random community drift, etc.), can be a good thing.
How would y'all feel about the proposal in the OP: document returned payload as undefined
the word undefined is a little scary because people think c UB, but documenting as not stable between versions would be great
C standards would call that unspecified behavior
I think it would be fine to document it as not something that can be relied on, but still try to return the first NaN argument when possible. We can try to do that and decide later if it's worth it.
but still try to return the first NaN argument when possible
I disagree. I would say "one of the NaN arguments when possible." Anything more than that is going to be untenable. For example, the following two operations are implemented using a single native x86 instructions yet don't return the same positional operand when given two NaNs:
julia> x = reinterpret(Float64,-1); y = reinterpret(Float64,-2);
julia> reinterpret(Int, x+y) # vaddsd
-1
julia> reinterpret(Int, ifelse(x<y,x,y)) # vminsd
-2
Hardware does not take strong positions on this so supporting any positional preference would be a pain even on a single architecture (to say nothing of multiple). Plus, compilers are free to fiddle with some operations (e.g., a+b
for b+a
) so inlining and other factors can change behavior even with the hardware held constant.
Yep, good point. One of the NaN arguments should be what we try to do.
I feel I should note, to my great annoyance, that payload propagation is a should and not a shall according to IEEE754.
I think it is still wise to try to attempt it because it simplifies reasoning about a rather... quirky condition in a type. In particular, it means that you know exactly what the value is when you do a binary operation of any NaN and any non-NaN (generally: the NaN).
There are a bunch of different NaN values.
reinterpret(Float64, reinterpret(UInt64, NaN) + 1)
andNaN
are two examples.NaNs propagate through floating point operations.
sin(NaN)
must beNaN
in the sense ofisnan(sin(NaN))
, but which NaN should it return? Must it return the canonicalNaN
? Must it return its input? May it return some other number thatisnan
for performance reasons? These questions come up for most math functions,min
/max
/sort
, and possibly others.I propose to explicitly document that mathematical functions (e.g.
sin
,hypot
,min
) will produce an NaN result on NaN input but that which NaN is produced is an implementation detail.