Closed ExpandingMan closed 6 years ago
I think I'm wrong and there is a bit mask. This doesn't seem consistent with the Arrow standard. To be consistent with Arrow, wouldn't one have to have the references contain no nulls, but the array they reference contain nulls?
Still confused by this, but it doesn't seem actionable within Feather.jl.
The metadata for dictionary encoded data seems all screwed up to me, and I'm starting to think that this is a problem with the feather format itself.
As things are, missing values are currently dealt with by having a
-1
as a reference. The problem with this is that theMetadata.PrimitiveArray
object that describes it hasnull_count > 0
despite the fact that there is no null bitmask. This is inconsistent. This would be ok if there were something else indicating what is going on, like for instance ifencoding
is set toDICTIONARY
, but currently this doesn't happen. In fact currently it looks likeencoding
is neverDICTIONARY
in any case. Something's got to give. I still think I'm missing something about what goes on in this case, but regardless the metadata seems confusing and inconsistent.