Open adkabo opened 2 years ago
@adkabo, thank you for your suggestion. DataKnots borrowed the missing
semantics from SQL, where NULL
being equivalent to FALSE
is the default, and, in fact, the only option. But I see your point and perhaps it's worth revisiting.
Hello DataKnots team,
I'm looking into the DataKnots project and I'm excited about what I see. It looks like a very powerful tool.
I do have an issue that I'd like to discuss. In my use cases, values are "missing not at random", and I need to treat them with caution. For example, it might be that the lowest true values are always unobserved. Naive behavior when filtering, joining, or aggregating on missing values will lead me to incorrect conclusions.
In base Julia,
filter
lets me be confident I'm not accidentally dropping significant missing values.On the other hand, currently DataKnots.jl silently drops missing values.
Using tools that require me to mentally track missingness and ensure rows aren't silently dropped takes effort I'd rather spend on other parts of my analysis. Tools like Missings.jl's
passmissing(f)(x)
andf(skipmissing(xs))
make it easier to do this explicitly.For more discussion, see https://github.com/JuliaData/DataFrames.jl/issues/2499 about joining tables on missing values.