Document NaN policy - Githubissues

LilithHafner commented 1 year ago

There are a bunch of different NaN values. reinterpret(Float64, reinterpret(UInt64, NaN) + 1) and NaN are two examples.

NaNs propagate through floating point operations. sin(NaN) must be NaN in the sense of isnan(sin(NaN)), but which NaN should it return? Must it return the canonical NaN? Must it return its input? May it return some other number that isnan for performance reasons? These questions come up for most math functions, min/max/sort, and possibly others.

I propose to explicitly document that mathematical functions (e.g. sin, hypot, min) will produce an NaN result on NaN input but that which NaN is produced is an implementation detail.

mikmoore commented 1 year ago

In defiance of "never say never", ~~it's not a horrible bet that literally no Julia user relies on the NaN semantics of any function beyond isnan(f(NaN))~~ EDIT: someone below says that payload tagging is used in SentinelArrays.jl. There would be a cost to maintaining this behavior when the basic functions do not. For how exceedingly rarely somebody cares and how relatively cheaply one can wrap a function for a specific semantic, I think we can afford to adopt any-NaN semantics (a name I made up, meaning that what payload is produced from NaN inputs is unspecified). Further, any-NaN semantics leave room to make non-breaking changes in the future if the landscape shifts such that people do actually care about payloads. It's also robust to hardware that makes unusual NaN propagation choices (~~I don't think IEEE754 dictates a specific semantic they must follow~~ EDIT: see end of post).

Up to now, almost every function has been implemented with an input-NaN semantic (another name I made up, specifying that the payload of one NaN input is propagated to the output). This is also what's usually (but perhaps not always?) used by hardware-native operations. In fact, our current input-NaN semantic is usually contingent on hardware input-NaN semantics.

There is a risk that this results in re-implementations of functions that "break" existing behaviors if somebody really did rely on payload propagation. For example, I believe there is a faster min -- but maybe not max -- on x86 if you're willing to mangle payloads. But there was never any formal guarantee and I'm not sure that anybody ever cared.

Is the proposal to document this centrally or on a per-function basis? Per-function seems like it would be a never-finished job and add noise to docstrings, so I'd propose to only document it centrally. Functions which have notable deviations should be documented locally. For example, that hypot is not poisoned by NaN in the presence of Inf.

EDIT: Originally, I was unsure of IEEE754's stance on payload propagation. The document linked by a later poster suggests that "The current standard specifies that if an operation has multiple NaN inputs, then the result should be one of the input NaNs. The standard does not specify which one." I assume this extends to unary functions as well.

LilithHafner commented 1 year ago

If no users care that would make things easier. I posted on slack and discourse for higher visibility.

We could also run pkgeval on a branch that mangles all NaNs, but that seems like a lot of work.

stevengj commented 1 year ago

See also this IEEE standards document for some more background. The most common extant applications of NaN payloads seems to be (a) tracking exception types and (b) tracking NA (missing) values in R, neither of which are especially critical in Julia (because we normally use exceptions and missing values, respectively). (I could imagine some Julia application using R-style tagged-NaNs instead of Union{Missing,Float64} for performance/memory reasons, I guess?) There is also JavaScript-style NaN boxing, which seems even less likely in Julia. The IEEE document also mentions some general issues with trying to propagate NaN payloads.

quinnj commented 1 year ago

We use an R-style tagged-NaN in SentinelArrays.jl.

Specifically, this NaN:

julia> Core.bitcast(Float64, typemax(UInt64))
NaN

because we do a memset with 0xff on the Vector{Float64} to set missing.

mikmoore commented 1 year ago

It seems that some people do use payload tagging in some cases.

Further,

The document linked by a later poster suggests that "The current standard specifies that if an operation has multiple NaN inputs, then the result should be one of the input NaNs. The standard does not specify which one." I assume this extends to unary functions as well.

If true, this would mean that to reject payload propagation semantics would be to violate IEEE754 semantics on any function defined therein. I'm not excited about this prospect.

Let's talk cost/benefit. Are there functions that we would implement differently with loosened NaN semantics? I mentioned a small optimization of min on x86 (not aarch64) but it wouldn't be game-changing. Any others?

It seems that, if anything, we might have to document a general policy (although perhaps not a strict guarantee) that a NaN output resulting from one or more NaN inputs should include the payload of one of the NaN inputs. We'd have to adhere to this policy for IEEE754 functions but probably should in other cases as well.

StefanKarpinski commented 1 year ago

It seems like for any function that returns NaN when one of the inputs is NaN, we can try to return the NaN that was passed in. That's how hardware float operations work, so it often happens naturally. In places where we "generate" a NaN, we should produce the "standard NaN", namely the one you get when you evaluate NaN.

andrewjradcliffe commented 1 year ago

Somewhat related food for thought.

Propagation of NaN payloads through through various functions in Base is haphazard at best -- and I am not suggesting that it must be uniform! -- but this fact is likely (happily) overlooked by the vast majority of users and developers. Bearing in mind the limitations imposed by LLVM, it is worthwhile to question what might be done.

sin is a simple example where we do something in Julia which mangles a payload (i.e. we return the "standard NaN").

The code below demonstrates some of the heterogeneity.

x = reinterpret(Float64, reinterpret(UInt64, NaN) | 0xff);
for f in (sin, cos, tan, acos, asin, atan, log, exp, sqrt, abs2)
    println(f, "\t:\t", bitstring(f(x)))
end

If we want to follow Stefan's logic, then all occurrences which amount to isnan(x) && return NaN must instead be isnan(x) && return x. Easily done and without penalties, at least from a conceptual standpoint; the test suite may inadvertently rely on the extant behavior, but should not be too substantial in Base. The ecosystem at large may rely on the haphazardNaN` behavior for testing (i.e. silencing of payloads by some functions); I suppose PkgEval to measure extent of damage.

oscardssmith commented 1 year ago

That sin example is a good catch. A return x will be a bit faster since you don't have to load a new NaN value and can just return the one you have in a register already. That said in general, I don't really want to document NaN behavior since especially for 2 argument functions, I could see it being useful in some cases to make NaNs with arbitrary combinations of the bits of NaNs of the inputs.

andrewjradcliffe commented 1 year ago

I don't really want to document NaN behavior since...

I concur on leaving NaN behavior undocumented. Strategic ambiguity, particularly in light of the uncertainty about what might become commonly adopted 10 years from now (once the dust settles around IEEE, LLVM's handling of NaNs, random community drift, etc.), can be a good thing.

LilithHafner commented 1 year ago

How would y'all feel about the proposal in the OP: document returned payload as undefined

oscardssmith commented 1 year ago

the word undefined is a little scary because people think c UB, but documenting as not stable between versions would be great

vtjnash commented 1 year ago

C standards would call that unspecified behavior

StefanKarpinski commented 1 year ago

I think it would be fine to document it as not something that can be relied on, but still try to return the first NaN argument when possible. We can try to do that and decide later if it's worth it.

mikmoore commented 1 year ago

but still try to return the first NaN argument when possible

I disagree. I would say "one of the NaN arguments when possible." Anything more than that is going to be untenable. For example, the following two operations are implemented using a single native x86 instructions yet don't return the same positional operand when given two NaNs:

julia> x = reinterpret(Float64,-1); y = reinterpret(Float64,-2);

julia> reinterpret(Int, x+y) # vaddsd
-1

julia> reinterpret(Int, ifelse(x<y,x,y)) # vminsd
-2

Hardware does not take strong positions on this so supporting any positional preference would be a pain even on a single architecture (to say nothing of multiple). Plus, compilers are free to fiddle with some operations (e.g., a+b for b+a) so inlining and other factors can change behavior even with the hardware held constant.

StefanKarpinski commented 1 year ago

Yep, good point. One of the NaN arguments should be what we try to do.

workingjubilee commented 1 year ago

I feel I should note, to my great annoyance, that payload propagation is a should and not a shall according to IEEE754.

I think it is still wise to try to attempt it because it simplifies reasoning about a rather... quirky condition in a type. In particular, it means that you know exactly what the value is when you do a binary operation of any NaN and any non-NaN (generally: the NaN).

JuliaLang / julia

Document NaN policy #48523