Open ExpandingMan opened 7 years ago
The other solution is to define methods like in https://github.com/JuliaStats/NullableArrays.jl/pull/185.
Can you develop your use case for not lifting?
I think #185 is a good example.
Suppose I want to run through a dataframe, determine which rows are null, and then do something. For example
f(x) = isnull(x) ? false : (get(x) % 3 == 0)
Admittedly, at the moment I cannot think of a use which cannot be achieved using lifting along with isnull
if it returns a Vector{Bool}
(which currently it doesn't do), but I think in some cases it's just easier for the user to write a function like the above rather than dealing with isnull
.
Just thinking out loud here
f.(x) .& isnull.(x)
wouldn't work even if isnull
does return a Vector{Bool}
, so I'm not quite seeing an alternative as simple as writing an f(::Nullable)
and then doing f.(v)
, and that's in the case where return values are simple Bool
s, suppose they were more complicated objects.
One of the main use cases for null handling in which you really don't want to just return Nullable
is to replace the nulls with samples from a distribution. However, I suppose in such a case you really have to write a "manual" loop anyway.
This is one of my concerns with automatic lifting. I'm hoping that with the planned Nullable
revamp in Julia 1.0, map
and broadcast
will not lift. IMO it doesn't make sense to use those functions on Nullable
s at all. But that's an aside, not to mention it doesn't help us in the short-term (0.5 and 0.6)...
To answer your question, to apply a function elementwise without lifting, you could use a comprehension, e.g. [f(x) for x in X]
. This can't be applied in-place though.
In theory it's annoying that we don't provide a way to disable lifting, but in practice I'd like to see a real use case before marking this issue as something in need to be fixed in the short term. As @ararslan noted, I hope the nullable rework will help moving beyond this.
Hm. I'm trying to think if there is some deeper reason why you should never lift. For the record, if isnull
returns an AbstractVector{Bool}
, to replace nulls you can do something like
v[find(isnull.(v))] .= rand(sum(isnull.(v)))
which is pretty damn elegant if you ask me.
However, I keep coming back to the example of "selecting" rows. In my original example, the task was "Let me select elements defined by f
, excluding nulls". That seems pretty simple, but I still am not seeing a simple way of doing that. That is most definitely something I've had to do before and will have to do again. Is there some simple way of doing it that I can't see?
I'm a pretty big proponent of trying to get Nullable
(or whatever the "data null type" is to be called) to behave as similar to NaN
as possible (for years and years I only ever had to worry about NaN
s and, not only was it never a problem, but I never even had to give it much thought), so the idea of always lifting does have a certain appeal to me. However, the mere fact that !(Nullable{T}() isa T)
makes Nullable
s very different from NaN
s, and I don't see any way of changing that.
Edit: By the way, I know this isn't really the appropriate place to ask, but are there still plans to split Nullable
into two types (one for data where the default behavior is lifting, and another which doesn't get lifted by default)? Because that seemed to me like a pretty good idea.
Yes, a general idea would be to have two separate null-like concepts, one like NA
in DataArrays (modeled after its namesake in R and behaves like a scalar) and one like Option
in Rust (behaves like a collection of 0 or 1 elements, similar to current Nullable
but hopefully without the arithmetic).
There are querying frameworks for selecting rows, or you should be able to do something like dt[[f(x) for x in dt[:column]], :]
for dt
a DataTable
.
I take it the new NA
would be a wrapper like Nullable
so as to avoid the current type stability issues with DataArrays?
No, it will be a Union
as it is now, but there are planned optimizations for Union
s in 1.0.
Suppose I don't want to lift. Is there something like, for instance,
Since both
broadcast!
andmap
are overridden, the only way of handling this currently, as far as I can tell, is to create a wrapper function that runs your inner function through a "manual" loop.