Linux | Coverage | Documentation |
---|---|---|
This package makes a distinction between machine type and scientific type of a Julia object:
The machine type refers to the Julia type being used to represent
the object (for instance, Float64
).
The scientific type is one of the types defined in
ScientificTypesBase.jl
reflecting how the object should be interpreted (for instance,
Continuous
or Multiclass
).
using Pkg
Pkg.add("ScientificTypes")
The module ScientificTypes
defined in this repo rexports the
scientific types and associated methods defined in ScientificTypesBase.jl
and provides:
a collection of scitype
definitions that
articulate a default convention.
a coerce
function, for changing machine types to reflect a specified
scientific interpretation (scientific type)
an autotype
fuction for "guessing" the intended scientific type of data
For more information and examples please refer to the manual.
using ScientificTypes, DataFrames
X = DataFrame(
a = randn(5),
b = [-2.0, 1.0, 2.0, missing, 3.0],
c = [1, 2, 3, 4, 5],
d = [0, 1, 0, 1, 0],
e = ['M', 'F', missing, 'M', 'F'],
)
sch = schema(X)
will print
┌───────┬────────────────────────────┬─────────────────────────┐
│ names │ scitypes │ types │
├───────┼────────────────────────────┼─────────────────────────┤
│ a │ Continuous │ Float64 │
│ b │ Union{Missing, Continuous} │ Union{Missing, Float64} │
│ c │ Count │ Int64 │
│ d │ Count │ Int64 │
│ e │ Union{Missing, Unknown} │ Union{Missing, Char} │
└───────┴────────────────────────────┴─────────────────────────┘
Detail is obtained in the obvious way; for example:
julia> sch.names
(:a, :b, :c, :d, :e)
To specify that instead b
should be regared as Count
, and that both d
and e
are Multiclass
, we use the coerce
function:
Xc = coerce(X, :b=>Count, :d=>Multiclass, :e=>Multiclass)
schema(Xc)
which prints
┌───────┬───────────────────────────────┬────────────────────────────────────────────────┐
│ names │ scitypes │ types │
├───────┼───────────────────────────────┼────────────────────────────────────────────────┤
│ a │ Continuous │ Float64 │
│ b │ Union{Missing, Count} │ Union{Missing, Int64} │
│ c │ Count │ Int64 │
│ d │ Multiclass{2} │ CategoricalValue{Int64, UInt32} │
│ e │ Union{Missing, Multiclass{2}} │ Union{Missing, CategoricalValue{Char, UInt32}} │
└───────┴───────────────────────────────┴────────────────────────────────────────────────┘
ScientificTypes is based on code from MLJScientificTypes.jl (now deprecated) and in particular builds on contributions of Anthony Blaom (@ablaom), Thibaut Lienart (@tlienart), Samuel Okon (@OkonSamuel), and others not recorded in the ScientificTypes commit history.
ScientificTypes.jl 2.0 implements the DefaultConvention
, which
coincides with the deprecated MLJ
convention of
MLJScientificTypes.jl
0.4.8. The code at ScientificTypes 1.1.2 (which defined only the API)
became
ScientificTypesBase.jl
1.0.