Open jiahao opened 10 years ago
I think we should throw an error whenever the condition was explicitly checked for anyway.
To summarize the prevailing thinking with @StefanKarpinski @loladiro @JeffBezanson:
Operations whose results would produce NaNs should throw Exception
s if the cost of testing a NaN would not dominate the cost of the operation.
By this rule of thumb, elementary arithmetic on real floating-point numbers would be allowed to return NaNs if necessary, but more complicated operations like complex division should throw a DivideError
.
I don't understand. What operations should be allowed to return NaN, (unless the argument is NaN)? Only /(a::FloatingPoint, b::FloatingPoint)
?
I think we only have a clear idea that real elementary arithmetic (+, -, x, /) should be allowed to produce and propagate NaNs, since the cost of testing for a NaN or inputs that would produce NaNs would be comparable (or more expensive) than the actual operation itself, but not much else.
So all other functions (also those who does not return NaN for any real input) must check for NaN on input and throw DomainError?
No, NaN input can just propagate. We mostly don't want to produce NaNs from non-NaNs.
The poster child is complex division: rather than generating a complex NaN when dividing by zero, just raise an error. Since complex division is so, um, complex, checking for a zero divisor is not exactly going to affect it's performance in any meaningful way. Raising and error is simpler and alerts the user to the problem sooner.
Why not, but if that's the only operation which would be affected, is it worth introducing an inconsistency? It would be good to have a simple rule for which functions return NaN, and which throw an error (I mean, an user-understandable rule, not something based on computing costs).
My guess at the moment is that the rule is: only real floating-point +, -, *, / can produce NaNs from non-NaNs.
I would also add complex floating-point +, -, *, / to that list. The 1/complex(0.0,0.0)
case is a divide-by-zero error, so you can still get NaNs from complex /
in other ways.
As outlined in this discussion, a more systematic solution would be to take advantage of floating-point exception masking. This would allow users to turn exceptions on and off as required (e.g. have exceptions on for development/debugging, and off for production code), reduce the overhead (by fewer range checks) and generally make functions more vectorisation friendly.
As noted in the discussion linked above, it becomes necessary to turn off hardware floating point exceptions in some cases:
So it becomes rather difficult to decide when to allow floating point (and by extension complex) exceptions and when to disallow them, and the one piece of Julia code may be used in many situations.
How about the following proposal:
0.0/0.0
throws InvalidException
) by default, but all others are masked (1.0/0.0
returns an Inf
). We provide a command line argument (--mask-invalid
?) to disable this behaviour.cfunction
should use the floating point masks of the caller.ccallmask
?) to enable masking for the duration of the ccall
.DomainError
s can be retained for integer-valued functions.I feel that this should address the above concerns of @elextr, as well as address the needs of @eschnett and @mlubin mentioned on the mailing list discussion, and generally make us much closer to getting the proverbial Kahan tick of approval.
Possible issues:
InvalidException
(unless, of course, they explicitly disable masking beforehand).InvalidException
. Just using 0.0/0.0
doesn't work, as LLVM changes it to a constant NaN
. The openlibm trick of using return (x-x)/0.0
, where x
is the argument to the function, seems to work though: this should always raise an invalid exception unless x
is already a NaN
(which is the usual desired behaviour).ccall
s (for which LLVM should keep the correct order). But it does mean that you won't be able to enable it reliably for short snippets of Julia code.
@simonbyrne looks a good basic proposal but for the notes below.
In a Julia session, the invalid exception is unmasked (i.e. 0.0/0.0 throws InvalidException) by default, but all others are masked (1.0/0.0 returns an Inf). We provide a command line argument (--mask-invalid?) to disable this behaviour.
Thats ok, but it forces all ccalls to mask it off. So a default of all (or most?) exceptions on is not going to be any worse and that will provide the benefits to pure Julia code. Otherwise all off.
We can provide wrappers (ccallmask?) to enable masking for the duration of the ccall.
If at least one exception is going to be enabled by default (as above) then it is probably better that the default ccall turns them all off and restores the mask and ccallmask is used for the exceptional circumstances where it is known safe to leave some on. Turning all exceptions off is going to be needed for nearly all ccalls and doing it manually is a subtlety that shouldn't be required of newcomers to Julia. Note this will also need to be applied to all ccalls inside Julia, and its libraries and packages where it can't be proven that the code ccalled is exception safe or cannot generate exceptions.
Julia code embedded in other languages, or called via cfunction should use the floating point masks of the caller.
Functions and libraries should not change floating point masks without resetting them to their original setting.
I would just add "and must catch all exceptions" to both of these.
I would disagree that ccall
should mask by default, as it does add to the overhead of calling external functions.
Thinking about it some more (and at this point I am talking about Julia code only, the issues with foreign code remain as above) if the FPE masks can be changed:
So to avoid failures all Julia FP code is going to have to do one of:
None of these is particularly attractive, the first is very noisy and the other two are exceptionally error prone. At this point I admit I don't have a "nice" solution.
I fear that this would work about as well as dynamically scoped rounding modes – i.e. not well at all :-\
@elextr I don't think this should be an issue: the way I see it, the whole point of raising floating point exceptions is so that you can get rid of whatever is causing them, you don't actually want to try/catch
them (which is why I don't think my issue 1 above is such a problem). So a function should not expect an exception to be raised (because the function will simply stop anyway if one is raised), and should always accept NaN
as an argument.
Basically, masks should only be changed for two reasons:
ccall
for "exception-unaware" numerical librariesI would go so far as to say that packages which modify masks in any other way shouldn't be accepted in METADATA unless they can provide a good justification for why they need to do it.
@StefanKarpinski I guess the flip side is that I feel that DomainError
is basically trying to be an InvalidException
, except that it's slow, inconsistent and inflexible.
@simonbyrne It is often very difficult to ensure ahead of time that no nans appear in a calculation. Thus in my code, I often rather accept that the result is a nan, and fix up the problem later. These calculations happen in a tight inner loop, and handling exceptions in this loop is a non-starter, as it would prevent vectorization.
I appreciate that many people don't like having to deal with nans explicitly, because in most cases, nans result from errors e.g. in the input data. However, I quite like this part of the IEEE standard, and would like a way to use nans in Julia.
I'm not saying this should be the default, or that all library functions should be nan-safe in this way -- but there needs to be a way to use nans for efficiency in an HPC environment. I'd be happy to use a macro similar to @inbounds
for this. I'd also be happy to re-write Julia's standard math functions (sqrt
, exp
, log
) myself. However, there needs to be a reasonable way for experts to do so.
Maybe the solution is to introduce alternate versions of the built-in math functions which return NaN instead of throwing an exception?
@mlubin The problem is then the functions that call the built-in ones, and the ones that call those...
I do think the IEEE exception functionality can be made to work in a sensible fashion, and that it provides our best hope for ensuring consistency and interoperability with other programs.
@simonbyrne, any library/package functions that call log
should be expecting it to throw an exception, changing that behavior with a flag seems crazy.
Here is another case where nans are useful:
In vectorized code, branches are expensive. It is often cheaper to evaluate both branches of an if statement, in particular if one of them is very cheap.
Take as example the expression x>0 ? log(x) : C
with some constant C
. The result is always well-defined. However, it may be much faster to evaluate ifelse(x>0, log(x), C)
instead. A nan may be generated, but will always be ignored.
@eschnett, good point. I've also come across cases where it can be convenient to compute a log even when it may not be used later.
@simonbyrne I agree that generating exceptions is a nice way to do a class of problems. But that should not make alternative strategies impossible, or so expensive or complicated that they are effectively impossible.
the way I see it, the whole point of raising floating point exceptions is so that you can get rid of whatever is causing them, you don't actually want to try/catch them (which is why I don't think my issue 1 above is such a problem).
NaNs can be caused by data, not only by coding errors. Real world data can be noisy and that can cause any of the conditions that generate exceptions. It is often not acceptable to throw an exception and stop processing due to one messy data point.
I would go so far as to say that packages which modify masks in any other way shouldn't be accepted in METADATA unless they can provide a good justification for why they need to do it.
Which makes the callback situation untenable, https://groups.google.com/d/msg/julia-dev/EaO1VSOXx2o/ruSZr0jhpigJ won't be able to use anything in METADATA then.
@elextr My whole reason for being so keen on the floating point exceptions is that they make different strategies feasible: e.g. "0/0
is an error, I should fix it", as well as "0/0
is just a weird edge case, I can ignore those cases". The main thing is that the strategy is consistent throughout the program.
With regards to the callback situation: the way I envision it is that you would use a masked ccall
, so any callbacks made during this time would return NaN
s instead of throwing exceptions (though of course, I could be wrong on how that would work). If you wanted to do the same thing in a Julia library instead of a C one, then perhaps that could be regarded as a good justification (as long as the library then returns the masks to their original settings).
My whole reason for being so keen on the floating point exceptions is that they make different strategies feasible: e.g. "0/0 is an error, I should fix it", as well as "0/0 is just a weird edge case, I can ignore those cases". The main thing is that the strategy is consistent throughout the program.
Agree its fine if the whole program is consistent.
What I belatedly came to realise, is that being consistent throughout the program becomes impossible once you use external libraries that may be written to the opposite strategy to yours.
As I said previously, this leads to the the situation where all functions need to set and restore the FPE mask to suit their strategy, in case they are called from code with a differing strategy. Or all code needs to know what strategy the functions they call use.
And that applies to callbacks as well. I agree that callbacks from C will probably be called with exceptions off, since the calling C code will be running with them off but callbacks from (for example) solvers written in Julia may be called with exceptions on. But whichever it is, then they call other libraries, including base, and then the paragraph above applies.
What I belatedly came to realise, is that being consistent throughout the program becomes impossible once you use external libraries that may be written to the opposite strategy to yours.
Agreed here. The logical alternative is to provide versions of the basic functions that return NaN
s. This at least allows everyone to be consistent according to the situation. I'm guessing that typically one would want a DomainError
to be thrown for user friendliness, but in specialized cases, e.g., vectorization, interacting with C code, etc., switching log
to log_nan
(horrible name) makes more sense.
log�()
? ;)
@mlubin that requires that the code of every library that you want to use (including the C ones) is changed to use log_nan
since thats what the current behaviour of log
is.
It would be better if the new behaviour has the new name that generates exceptions, eg ex_log
as an example of an equally bad name :)
Following the open-source philosophy, I've created a package which solves my problem: https://github.com/mlubin/NaNMath.jl.
I had a bit of time on Friday to try out my idea:
https://github.com/simonbyrne/julia/tree/fpe
when combined with my mxcsr
script gives some interesting results.
On Linux it seems to work fairly well, the only problem being that after throwing one error it seems re-enables the mask. Hopefully there is a sensible way to fix this. One idea would be to use the "official" feenableexcept
interface, but I haven't had a chance to look at it yet.
On OS X, my experience was more frustrating: it does throw exceptions at the correct times, but it doesn't seem to return the correct exception code that indicates the type of exception (Invalid, Inexact, DivByZero, etc). Unfortunately, it seems that OS X doesn't officially support throwing floating point exceptions at all (e.g. see here), so we might have a hard time getting Apple to fix it.
If anyone has used this sort of stuff in C or C++, I'd appreciate any help or suggestions that I could try.
@simonbyrne This example http://www-personal.umich.edu/~williams/archive/computation/fe-handling-example.c can be found by chasing the link you provided. It seems to provide an implementation of the necessary functionality for OS X, both for PPC (!) and Intel Macs.
The respective function calls translate to a few machine instructions, so we don't need any Apple support for this.
Thanks @eschnett. It looks pretty similar to what I was doing, but I'll try it out. I seem to recall that the problem was the signals returned by siginfo were completely arbitrary.
Bump.
What would happen if functions had a nan mode in them, so sqrt would be sqrt(x,nan=false) with a default value? To me this seems to let libraries do what they want, let users do what they want, and not mess with code generation.
(Context: this issue has come up recently in #4967 and elsewhere)
We currently don't treat the results of indeterminate computations consistently between real and complex arithmetic.
Example 1: cosine
Example 2: inverse
inverse of 0
This illustrates a problem with infinity and signed zero. Unlike real infinities, of which there are only two
FloatingPoint
values, you can have 13 possible representations of complex infinities which convey different information about the phase (argument) of the complex infinity:The result is correct if we work with unsigned zeros. However, after accounting for the signed zeros in
complex(+0.0, +0.0)
, this result should becomplex(+Inf, +Inf)
. ADivideError
might also be a reasonable alternative here.inverse of infinities
The inverse mapping is also problematic:
inverse of indeterminates
Is it meaningful to distinguish between these three possible complex NaNs? It seems silly in this example, but consider also:
etc.
Example 3: roots (nonintegral powers)
Whereas
sqrt(-1)
returns the notoriousDomainError
, the inverse square root (as computed byx->x^-1/2
does not, but returns aNaN
instead. This use ofNaN
is sanctioned by IEEE 754 in the specific case of a real operation with no real output.tl;dr: if a floating-point computation returns an indeterminate value, when should it return a
NaN
(or any of its complex variants), and when should it throw an error likeDivideError
orDomainError
? By my reading of IEEE 754, both of these behaviors are allowed (throwing an error would correspond to a signalingNaN
which is trapped by the error handler). Each of these examples in isolation show valid behavior; however, we should be consistent.