Open schlichtanders opened 6 months ago
Attempting to construct the minimal object:
julia> using CategoricalArrays, OneHotArrays
julia> cv = CategoricalArrays.CategoricalValue('b', CategoricalArray('a':'z'))
CategoricalValue{Char, UInt32} 'b'
julia> dump(cv)
CategoricalValue{Char, UInt32}
pool: CategoricalPool{Char, UInt32, CategoricalValue{Char, UInt32}}
levels: Array{Char}((26,))
1: Char 'a'
2: Char 'b'
3: Char 'c'
4: Char 'd'
5: Char 'e'
...
22: Char 'v'
23: Char 'w'
24: Char 'x'
25: Char 'y'
26: Char 'z'
invindex: Dict{Char, UInt32}
slots: Memory{UInt8}
length: Int64 64
ptr: Ptr{Nothing} @0x0000000160607020
...
julia> cv.pool.levels
26-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
...
julia> Int(cv.ref), length(cv.pool.levels)
(2, 26)
julia> OneHotArrays.onehot(cv::CategoricalValue) = OneHotVector(cv.ref, length(cv.pool.levels))
julia> onehot(cv)
26-element OneHotVector(::UInt32) with eltype Bool:
⋅
1
⋅
⋅
⋅
⋅
...
julia> dump(onehot(cv))
OneHotVector{UInt32}
indices: UInt32 0x00000002
nlabels: Int64 26
Are these two integers all that's required, or are there more complicated examples?
I think this is all, but I am not an expert on CategoricalArrays
Motivation and description
In Data Science
CategoricalArrays.CategoricalValue
orCategoricalArrays.CategoricalVector
and the like appear often. (RDatasets loads DataFrames with columns of that type by default).It would be great if onehotbatch could simply be applied on this.
I just came to this package, still figuring out how to transform such a Categorical Value/Vector into onehot Vector/Matrix... It is very possible that I missed something obvious
Possible Implementation
No response