JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.4k stars 5.45k forks source link

Nullable and boolean operations #13207

Closed nalimilan closed 8 years ago

nalimilan commented 8 years ago

@davidagold raised at https://github.com/JuliaStats/NullableArrays.jl/issues/62 the issue that Nullable{Bool} currently doesn't work well with boolean operations, including if, && and ||, which has consequences on many functions, like find.

1) Currently, the implementation in Base simply raises an error when comparing two Nullable:

julia> Nullable(1) == Nullable(1)
ERROR: NullException()
 in == at nullable.jl:52

NullableArrays.jl improves this by comparing the contents of the Nullables and returning a Nullable{Bool}, raising an error if one of them is NULL.

2) But even with this change, working with the result of a comparison between Nullables isn't convenient, as it cannot be used with if, && and ||:

julia> y == Nullable(2)
Nullable(false)

julia> y == Nullable(1)
Nullable(true)

julia> if y == Nullable(1)
           print("x")
       end
ERROR: TypeError: non-boolean (Nullable{Bool}) used in boolean context

# Solution:
julia> if get(y == Nullable(1))
           print("x")
       end
x

This means that NullableArray needs to come with its own implementation for many functions to save the user from typing get all over the place. For example:

julia> z = NullableArray([1, 2])
2-element NullableArray{Int64,1}:
 1
 2

julia> find(x->x==Nullable(1), z)
ERROR: TypeError: non-boolean (Nullable{Bool}) used in boolean context
 in find at array.jl:764

# Solution in the absence of a specific method:
julia> find(x->get(x==Nullable(1)), z)
1-element Array{Int64,1}:
 1

Would it be possible to allow Nullable{Bool} in if, && and ||? The semantics are well-defined: they would be strictly equivalent to calling get on the Nullable: Nullable(true) is equivalent to true, Nullable(false) is equivalent to false, and Nullable() raises an error. Special-casing Nullable would have the advantage that no notion of "truthiness" is introduced in Julia, with the drawback that it wouldn't be generic -- but maybe Nullable is the only type for which this definition makes sense.

Cc: @davidagold @johnmyleswhite

JeffBezanson commented 8 years ago

No, this is not reasonably possible. However this makes another good argument that Nullable should just be Union{T, Void}. That's already built in, will work as needed here, and just needs to be made efficient.

davidagold commented 8 years ago

If that design could be made comparably efficient, then it seems like that would essentially obviate the whole issue of lifting as well.

The biggest downside of the current situation I have seen is that a method defined for, say, an AbstractArray argument and which would otherwise work for a NullableArray argument needs to be copied almost verbatim just to modify the conditionals, e.g. here. This pattern makes method extension less elegant than otherwise possible using Julia's multiple dispatch.

nalimilan commented 8 years ago

@JeffBezanson Do you think Union{T, Void} could be made efficient enough in a reasonable time frame?

malmaud commented 8 years ago

I won't rehash the whole debate here and don't have a strong opinion, but I'll point out that some people like Nullable because it forces authors to explicitly consider the case when return values are potentially null; with Union{T,Void} your code might appear to work in limited tests but fail when a function returns a nothing that you weren't expecting.

JeffBezanson commented 8 years ago

Also Nullable{T}() conveys type information, while nothing does not. So we'd probably need Null{T}() and Union{T, Null{T}}.

One worry with making Nullable built in is that while some cases can be handled automatically (e.g. unwrapping Nullable{Bool} in if), we will never run out of cases where somebody wants Nullable automatically unwrapped and it isn't. One reason is that branches are a rare case of a context in julia that requries a specific type.

johnmyleswhite commented 8 years ago

I think we should follow C#'s lead: only auto-lift core math operations. That solves the infinite space of possible unwrappings very readily.

StefanKarpinski commented 8 years ago

"core math operations" is a far less well-defined thing in Julia than in C#, unfortunately.

nalimilan commented 8 years ago

Well, in C# comparison operators for Nullable return a boolean, not a Nullable, so this issue does not really exist. That said, I don't think returning a boolean is a good idea, better preserve missingness IMHO (https://github.com/JuliaStats/NullableArrays.jl/issues/74). In C#, | and & do preserve missingness, which is kind of inconsistent with comparisons...

malmaud commented 8 years ago

This is too far afield for now, but a more flexible if system that could potentially handle this situation could also be useful for nonstandard interpretation usages like https://github.com/zenna/Sigma.jl where you want to have code like if x where x::RandomVariable{Bool}.

nalimilan commented 8 years ago

@malmaud Can you detail this use case? (Though I suspect this is exactly what Jeff is worried about... :-)

malmaud commented 8 years ago

What is the rational for not having comparisons between nullables return booleans? I can kind of see the purity argument, but is there a concrete problem it would cause?

davidagold commented 8 years ago

@malmaud what do you do if one of the arguments is null? If you choose to throw an error, you lose the ability to propagate missingness. If you return an empty Nullable, then your method is not type-stable. Though perhaps this latter point could be assuaged by developments in the type inference system.

malmaud commented 8 years ago

Return false? I might naively expect that it's simply a false statement that Nullable(1)==Nullable{Int}().

ScottPJones commented 8 years ago

Nullable{Int} == Nullable{Int} should also return false, if you want to do 3 value logic.

davidagold commented 8 years ago

Here's my comment above phrased as a reason for why auto-lifting if over Nullable{Bool} values may be defensible: if is not extensible via multiple dispatch. If f(u::U) calls g(u), and g is the only part of f that does not have a Nullable specific implementation, then it is trivial to extend f over a signature of (u::Nullable{U}) by including a straightforward extension of g. On the other hand, if f branches on a Nullable{Bool}, then the entirety of f needs to be re-written.

On the other hand, I can see how automatically lifting if over Nullable{Bool} arguments may not even be worth it. It depends on what a lifted if does in the face of a null condition. Does it just return Nullable{Bottom}? Does it throw an exception? The sort of behavior that could feasibly be given to such a implementation seems to limit its overall usefulness.

ScottPJones commented 8 years ago

I suppose a case could be made that it could return Nullable{Bool}, to preserve some idea of unknown.

malmaud commented 8 years ago

@davidagold That is exactly the same situation that probabilistic programming faces, so it's not unique to Nullable.

nalimilan commented 8 years ago

@malmaud I'd still be very interested in hearing more about that use case. How would it work with if?

malmaud commented 8 years ago

@nalimilan Say I have code that evaluates the probability that a random variable equals some variable, like

f(x)=x+1
x=DiscreteRandom(p_is_5=.4, p_is_7=.6)::RandomVariable{Int} # X represents a draw from a discrete distribution, with P(x=7) = .6
+(::RandomVariable,::RandomVariable)=...
y=f(x)
@assert find_probabiliy(y, 8) == .7  # since P((x+1)==8) == P(x==7)

This is all elegant because of + overloading. But say instead we defined f equivalently (in the scalar x case) as

function f(x)
if x==7
  return 8
else
 ...
end

Then we are out of luck.

The ideal situation would be something like, for x::RandomVariable{Bool},

if x
 1
else
- 1
end

would evaluate to something like RandomIfStatement(x, 1, -1)::RandomVariable{Int}.

If if x; y; else z; lowered to a generic function like ifexpression(x,y,z), you could overload it for different types of x, x::Bool is the only defined method now. The problem with that solution of course is that y and z will both evaluate. But if ifexpression was a staged function, then maybe this could work.

@zenna might have thoughts on this.

nalimilan commented 8 years ago

@malmaud Looks like applying the principle of multiple dispatch everywhere. That's an interesting investigation, but one with far-reaching consequences that I don't even dare imagine... Maybe you can get Jeff interested by this new challenge, after all. :-p

I guess we can close this issue, let's see how far we can get without this special support for Nullable{Bool}.

zenna commented 8 years ago

I'm not sure about Nullables but in general the limitations on if and other short-cicuiting functions like || and && make it difficult/impossible to do static analysis by means of symbolic execution in Julia. There are other ways to do static analysis, e.g. analysing the source code (and Julia's introspection makes this easier than in most languages), but using symbolic execution is simple and means your analysis can exploit many properties already in the interpreter/compiler of the language.

My use case is in probabilistic programming. We have a random variable type RandVar{T} where T is the type of value. The goal is that you could use a RandVar{Float64} any where you use a Float64 and a RandVar{Bool} anywhere you use a Bool.

And you can, for the most part, with the glaring exception of using RandVar{Bool}s in functions that short-circuit.

It's not so much of an issue if all of your code is your own, then you can just use ifelse. For example, here is a cancer model that you can do in Sigma

cancer_base_rate = 0.01
breast_cancer = flip(cancer_base_rate)
positive_mammogram = ifelse(breast_cancer,flip(0.8),flip(0.096))
prob(breast_cancer, positive_mammogram)

The real problem is that when I want to use libraries written by other people where they use a normal if. It's frustrating because many functions almost work (if they've written their code sufficiently generally) but not quite, usually because of short-circuiting functions.

I have no idea where these limitations are fundamental; but if they're not, there are real use cases.

malmaud commented 8 years ago

@nalimilan I quickly wrote a simple package to demo how easy in principle it would be for Julia to allow customizing the behavior of control structures based on the type of the predicate.

carnaval commented 8 years ago

I don't think this approach handles side effects correctly ? You would at least need to pass lambdas to be able to control execution. You might for example want to run a loop until "convergence", i.e., P(eval(exit_cond) == true) > 1-eps ?

ScottPJones commented 8 years ago

This may be all wet, but could this be handled by having the compiler simply lower things to something like:

handle_cond(cond::Bool) = cond
if cond -> if handle_cond(cond) ...
while cond -> while handle_cond(cond) ...

and then overriding handle_cond for Nullables?

If the compiler can return typeof(b) without evaluating b, could it do something like the following, where _ is just a temp variable?

a || b -> (handle_cond(_ = (a), typeof(b)) ? _ : (b))
a && b -> (handle_cond(_ = (a), typeof(b)) ? (b) : _)
JeffBezanson commented 8 years ago

It's certainly easy to insert, say, convert(Bool, x) or condition(x) around every condition in the front end. I've resisted this because I really dislike "truthiness"; e.g. a program should say if isempty(x) instead of if x and leaving you guessing what the real condition is.

Of course that feature is also totally inadequate for @zenna's case. There you need to evaluate both branches of the if, and then pass all results to some other function, based on the type of the condition. That's a very tall order! Then there are also while and @goto. My guess is that a good way to handle this is to use @carnaval's abstract interpreter framework, which supports running programs over arbitrary domains. For such a dramatic change to semantics it seems to me you need a different evaluator.

StefanKarpinski commented 8 years ago

I agree with Jeff 100% here. This would solve only a little bit of the whole problem in a very patchwork sort of way and just leave you wanting more. Abstract interpretation is the correct approach. @carnaval, how about doing some kind of announcement and open sourcing of the green fairy?

tkelman commented 8 years ago

https://github.com/carnaval/green-fairy but it could use a license and some "what this code do" docs

malmaud commented 8 years ago

@carnaval I'm not sure I see the problem when evaluating the conditional causes side effects-Wouldn't simply translating if a; b to my_custom_if(a, b) be fine, since a is evaluated in the same scope in both cases?

It's true though if both branches of the if statement have side effects and you evaluate both branches, then both side effects will occur. That's a definite limitation. But if you deliberately write the branches to be side-effect-free, then it seems easy to evaluate both branches and dispatch the results to some other function based on the type of the condition.