Closed davidagold closed 7 years ago
@@ master #25 diff @@
==========================================
Files 27 28 +1
Lines 305 335 +30
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 174 208 +34
+ Misses 131 127 -4
Partials 0 0
Powered by Codecov. Last update 8d16960...17bd207
@nalimilan That's a good idea re: null_safe_op
. So, we'd have something like the following? It looks like the value of null_safe_op
for a particular combination of f
and x
gets inlined during lowering and then the branch gets taken care of by llvm (maybe?)...
julia> function lift(f, x)
if Base.null_safe_op(f, eltype(x))
return Nullable(f(unsafe_get(x)), x.hasvalue)
else
U = Core.Inference.return_type(f, Tuple{eltype(typeof(x))})
if isnull(x)
return Nullable{U}()
else
return Nullable(f(unsafe_get(x)))
end
end
end
lift (generic function with 1 method)
julia> f(x::Int) = x
f (generic function with 1 method)
julia> Base.null_safe_op(::typeof(f), ::Type{Int}) = true
julia> x = Nullable(1)
Nullable{Int64}(1)
julia> @code_warntype lift(f, x)
Variables:
#self#::#lift
f::#f
x::Nullable{Int64}
U::Type{Int64}
Body:
begin
NewvarNode(:(U::Type{Int64}))
unless $(QuoteNode(true)) goto 5 # line 3:
return $(Expr(:new, Nullable{Int64}, :((Core.getfield)(x,:hasvalue)::Bool), :((Core.getfield)(x,:value)::Int64)))
5: # line 5:
U::Type{Int64} = $(QuoteNode(Int64)) # line 6:
unless (Base.box)(Base.Bool,(Base.not_int)((Core.getfield)(x::Nullable{Int64},:hasvalue)::Bool)) goto 12 # line 7:
return $(Expr(:new, Nullable{Int64}, false))
12: # line 9:
return $(Expr(:new, Nullable{Int64}, true, :((Core.getfield)(x,:value)::Int64)))
end::Nullable{Int64}
julia> @code_llvm lift(f, x)
define void @julia_lift_63473(%Nullable.4* noalias sret, %Nullable.4*) #0 {
top:
%.sroa.2 = alloca [7 x i8], align 1
%2 = getelementptr inbounds %Nullable.4, %Nullable.4* %1, i64 0, i32 0
%3 = load i8, i8* %2, align 1
%4 = and i8 %3, 1
%5 = getelementptr inbounds %Nullable.4, %Nullable.4* %1, i64 0, i32 1
%6 = load i64, i64* %5, align 1
%7 = getelementptr inbounds %Nullable.4, %Nullable.4* %0, i64 0, i32 0
store i8 %4, i8* %7, align 8
%8 = getelementptr inbounds i8, i8* %7, i64 1
%9 = getelementptr inbounds [7 x i8], [7 x i8]* %.sroa.2, i64 0, i64 0
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %8, i8* %9, i64 7, i32 1, i1 false)
%10 = getelementptr inbounds %Nullable.4, %Nullable.4* %0, i64 0, i32 1
store i64 %6, i64* %10, align 8
ret void
}
You probably already knew that, but I'd never looked at how the null_safe_op
business lowers. That's pretty cool.
Yes, I'm not sure where exactly the branch elimination is done, but what's certain is that it enables SIMD instructions in simple cases (cf. the recent BaseBenchmarks addition).
Though you won't be able to use the two-argument form of the Nullable()
constructor due to the reversal of isnull
in 0.6: you'll have to write the ifelse
manually like I did in my recent PR against NullableArrays.
I think we need some design decisions re: semantics here as far as isequal
and isless
goes -- mostly to cover how to treat mixed signatures. For instance, if we decide that, as far as sorting goes, we always want null values to be sorted last, then lift(isless, x::T, y::Nullable{S})
, where T
and S
are non-Nullable
types, shouldn't default to isless(x, y)
since that will MethodError
.
edit: Similarly, for isequal
, I think it's reasonable to expect that lift(isequal, 1, Nullable(1))
should return true
, so that will need its own implementation. We should probably include ==
in the blacklist, then, and it should have similar behavior to isequal
except that, since the former is a "stricter" and the latter is a "looser" sense of equality, we should have ==(Nullable{Int}(), Nullable{Int}()) == false
and isequal(Nullable{Int}(), Nullable{Int}()) == true
.
I'd say there are two different questions:
Bool
instead of Nullable{Bool}
. This includes isequal
and isless
, and maybe ==
, <
and >
(up for discussion, cf. the endless other thread)isequal
and isless
.We may want to support mixing scalars and nullables. I support this option, but I think that if we do it we should apply it for all functions, not just isequal and isless.
Same here, very much in favor of having support for mixing scalars and nullables. What I don't understand is whether the promotion system could be used for that or not.
I won't argue it more than this one post, but I'm not in favor of mixing scalars and nullables. It seems to defeat all of the safety arguments for having nullables. Indeed, I'd argue that 1 + Nullable(1)
is essentially an anti-promotion: you've gone from a safe scalar type to an unsafe scalar type that doesn't have any more precision, it just admits the dangerous NULL
value.
What I don't understand is whether the promotion system could be used for that or not.
I don't think we actually need it, since the lifting semantics imply calling get
on nullables when they are not null: lifting functions/macros would just pass scalars as-is (which unsafe_get
does now BTW). Anyway, the promotion rule for scalar and nullables is simple: it wraps the scalar inside a nullable. Promotion is mainly useful to choose the element type.
@johnmyleswhite The problem is to support things like select(df, y = 2x)
inside macros. Writing y = Nullable(2) * x
would be really weird, since 2
is actually the only value in the expression which is not originally a nullable.
Ok, but we already resolved the base case of select(df, y = 2x)
with automatic lifting. If you're opposed to automatic lifting, I can see why you want to be able to deal with that, but I think that just begs the question why you need to keep adding more custom functionality to work around non-automatic lifting.
To be clear, I was speaking about what to do in automatic lifting contexts. What to do outside of them is another subject for endless discussions. :-)
To be clear, I was speaking about what to do in automatic lifting contexts.
So I'm guessing things have changed a lot since I wrote my notes about this in June/July? In the draft I wrote, you don't ever need mixed operations -- everything computation happens in the non-nullable space except for null-ness checks.
I see two uses for mixed-signature lifting:
Nullable
and non-Nullable
eltype columns. I know you're against this, John, but they may happen.Nullable
values into query argument expressions.So I'm guessing things have changed a lot since I wrote my notes about this in June/July? In the draft I wrote, you don't ever need mixed operations -- everything computation happens in the non-nullable space except for null-ness checks.
That's basically what this does?
Okay, let me clarify some things for myself:
lift
+ generic isnull
/unsafe_get
supports mixed-signature lifting.isless
and isequal
-- they are rather to ensure that certain invariants propagate, e.g. null
is always greater than a non-null
, and isequal(null, null) == true
(or false
, I don't really care so long as it's clearly documented). Put another way, isless
and isequal
receive non-standard lifting semantics, and so require special lift
methods, similar to how |
and &
require special methods to implement 3VL semantics.lift
function. John, if I'm mistaken, would you please clarify?Here's my interpretation of how this differs from John's notes. Suppose one does select(src, B = 2A)
. In John's notes, we generate a kernel row -> 2 * row[1]
. Then we iterate over zip(src[:A])
, checking at each iteration if the current row (of zip(src[:A])
) has nulls. If it does, we insert a null into the result column B
. If not, we map unwrap
(equivalent to Base.unsafe_get
) over the row, apply the kernel, and insert the result into B
. Thus, the null checking happens in the iteration
In the paradigm implemented in this PR (and adapted to by https://github.com/davidagold/AbstractTables.jl/pull/5) we instead generate the kernel row -> lift(*, 2, row[1])
and essentially just map this over zip(src[:A])
. That is, we iterate over zip(src[:A])
, but rather than checking nulls before applying the kernel, we just apply the kernel and let the lift
method handle all null checks.
It seems that the biggest difference entailed by the design implemented in this PR is that lift
doesn't distinguish between 2
and row[1]
as arguments -- that is, it doesn't see row[1]
as an element of some column and 2
as a constant scalar. They're both just arguments and receive equal treatment -- i.e., null checks and unsafe gets. On the other hand, doing null checks during iteration as in John's notes singles out row[1]
as an element of a Nullable
eltype column and restricts the null checks to it.
However, since isnull
is invariant over values of non-Nullable
types, it should not contribute any unnecessary code load (ideally). unsafe_get
should be similarly optimized away at compile time. So in practice --both in terms of semantics and in terms of code generation -- I don't see a huge difference between John's notes and the present implementation. Insofar as both strategies support select(src, B = 2*A)
, where the eltype
of B
is Nullable
, they both support "mixed-signature" lifting. And insofar as the user doesn't need to do anything other than wrap select(src, B = 2*A)
in a macro in order for lifting to work, such lifting is "automatic". So both approaches implement "automatic, mixed signature lifting". Neither approach ever defines a method *(::Int, Nullable{T})
; mixed signature lifting is NOT achieved by method extension. It is purely a consequence of the present implementation of lift
combined with generic isnull
and unsafe_get
functions.
Please, somebody tell me if I'm just crazy and/or missing something.
I can also see how my comments regarding isless
and my use of the term "mixed-signature lifting" may be confusing. My original point was that, in Base, isless(x::Nullable, y::Nullable)
behaves in such a way so that null values are never isless
than non-null values. Thus, isless
does not follow standard lifting semantics; it propagates an invariant in a way similar to |
and &
in three-valued logic. So, what should lift(isless, x, y)
do? We can't just have lift(isless, x, y)
fall back on isless(x, y)
because this will MethodError
if one attempts to do, say filter(src, 2 < A)
when the eltype
of A
is Nullable
. My solution is not to define isless{T, S}(x::T, y::Nullable{S})
, i.e. lift by method extension. Rather, my solution, which I will push shortly, is to define
@inline function lift(::typeof(isless), x, y)::Bool
return ifelse( isnull(x),
ifelse( isnull(y),
false, # x, y null
false # x null, y not null
),
ifelse( isnull(y),
true, # x not null, y null
isless(unsafe_get(x), unsafe_get(y)) # x, y not null
)
)
end
So this is "mixed-signature" lifting in the sense that it gives proper lifted semantics to filter(src, 2 < A)
but it is not "mixed-signature" in the sense that it relies on any method defined on a mixed signature. This is pretty much the same as the above example of how filter(src, A | true)
would work using the 3VL truth-tables implemented in this PR. One might say that the mixed-signature part of such lifting is handled by the "automatic" facility, which is just the syntactic transformation of f(xs...)
to lift(f, xs...)
within the call graph of a UDF expression.
Sorry to spam this thread with my yabbering. Does this help to clarify?
EDIT: Some change will have to be made to the above implementation of lift(::typeof(isless), x, y)
to handle cases when the argument types are not "null safe", but the spirit of the solution remains the same.
Sounds like a good approach to me. Implementing the logic in a lift
function allows separating it from the query macros, which sounds cleaner and makes it reusable.
Sounds like a good approach to me. Implementing the logic in a lift function allows separating it from the query macros, which sounds cleaner and makes it reusable.
And this strategy allows for three-valued logic semantics.
Hmm, I did a bunch of rebasing, so the diff has come out rather strange. I think the lesson there is, rename/reorganize files in a different branch, not while you're doing work on them. =/
EDIT: Also, TIL null_safe_op
is not defined for 0.5.
https://github.com/davidagold/AbstractTables.jl/pull/5 Looks good locally, and now includes a handful of tests of non-standard lifting semantics. (It also uncovered what I think is a substantial oversight concerning our syntax decisions. TODO: document this elsewhere and x-ref.) I'm going to look this PR over in the morning and then likely merge so I can make the other core changes to the internals and then lay out some semi-stable documentation.
I'm leaving this as WIP until I coordinate it across AbstractTables as well. This implements the higher-order lifting strategy described in https://github.com/davidagold/AbstractTables.jl/issues/2. The only noticeable user-facing change is that
|
and&
now exhibit three-valued logic semantics, so long as they are visible to the querying macro:cc @johnmyleswhite
TODO:
check function names against a (modifiable) "blacklist" of functions that don't receive standard lifting semantics. This list should include (at least):Implement non-standard lifting semantics forisnull
,isequal
,isless
,coalesce
(which we'd need to implement).isnull(x)
,isless(x, y)
,,isequal(x, y)
get(x::Nullable)
,get(x::Nullable, y)
by defininglift(method, ...)
.