JuliaStats / DataArrays.jl

DEPRECATED: Data structures that allow missing values
Other
53 stars 50 forks source link

RFC: Stop lying about eltype #280

Closed andreasnoack closed 7 years ago

andreasnoack commented 7 years ago

As discussed on Slack. It was actually not that bad. The core changes are in abstractdataarray.jl and defines

abstract type AbstractDataArray{T, N} <: AbstractArray{Union{T,NAtype}, N} end

Base.eltype{T, N}(d::AbstractDataArray{T, N}) = Union{T,NAtype}

So for subtypes of AbstractDataArray, the T is not referring to union but for signatures using AbstractArray the T refers to the union. This requires a little care when we ping-ping between Base and DataArrays definitions such as broadcast and similar. For that, it was convenient to define an extractT function to return the T part of a Union{T,NAtype}.

The changes here also require some changes to DataFrames. Hopefully, I'll be able to open a PR with them tomorrow and then we can look at the whole picture.

cc: @iamed2, @jeffbezanson

Update: Realized that an eltype definition is not needed here anymore since the fallback for AbstractArrays would now be correct.

andreasnoack commented 7 years ago

Thanks for the comments. I'm not sure how it will feel like in user code. We'll probably have to try it out a bit. I made the changes to DataFrames to get a feel for how much would have to be changed in library code. I don't think it was that bad and it will hopefully open up for using DataArrays in function not written explicitly for them.

My understanding is that Null is basically a reimplementation of NAtype with the main benefit that it lives in a separate package instead of being bundled with two array types. I can see that the semantics for comparisons are a bit different from NA. I could try making a commit with that change but I'm not sure it is worth bundling the two changes into the same PR.

ararslan commented 7 years ago

I could try making a commit with that change but I'm not sure it is worth bundling the two changes into the same PR.

👍 I think we should go ahead with this and try out the Nulls change in a separate branch. Luckily since the name is different (null vs NA), we should be able to put some deprecations in place that will ensure people make the right comparisons for null.

nalimilan commented 7 years ago

Sure, I didn't mean we should move to Nulls.jl in this PR, just that it could make sense to make both changes before tagging a release (if we decide to do both).

ararslan commented 7 years ago

Bump.

andreasnoack commented 7 years ago

Updated. The test failure on master is most likely because of the macrocalypse going on on master right now. Hopefully, it'll either be fixed or will betold how to fix DataArrays in a couple of days.

andreasnoack commented 7 years ago

I think this is ready but right now it causes a segfault on master

rofinn commented 7 years ago

Hooray, thanks @andreasnoack!