JuliaAI / ScientificTypes.jl

An API for dispatching on the "scientific" type of data instead of the machine type
MIT License
96 stars 8 forks source link
julia machine-learning mlj statistics

ScientificTypes.jl

Linux Coverage Documentation
Build Status codecov.io

This package makes a distinction between machine type and scientific type of a Julia object:

Contents

Installation

using Pkg
Pkg.add("ScientificTypes")

Who is this repository for?

What's provided here?

The module ScientificTypes defined in this repo rexports the scientific types and associated methods defined in ScientificTypesBase.jl and provides:

Very quick start

For more information and examples please refer to the manual.

using ScientificTypes, DataFrames
X = DataFrame(
    a = randn(5),
    b = [-2.0, 1.0, 2.0, missing, 3.0],
    c = [1, 2, 3, 4, 5],
    d = [0, 1, 0, 1, 0],
    e = ['M', 'F', missing, 'M', 'F'],
    )
sch = schema(X)

will print

┌───────┬────────────────────────────┬─────────────────────────┐
│ names │ scitypes                   │ types                   │
├───────┼────────────────────────────┼─────────────────────────┤
│ a     │ Continuous                 │ Float64                 │
│ b     │ Union{Missing, Continuous} │ Union{Missing, Float64} │
│ c     │ Count                      │ Int64                   │
│ d     │ Count                      │ Int64                   │
│ e     │ Union{Missing, Unknown}    │ Union{Missing, Char}    │
└───────┴────────────────────────────┴─────────────────────────┘

Detail is obtained in the obvious way; for example:

julia> sch.names
(:a, :b, :c, :d, :e)

To specify that instead b should be regared as Count, and that both d and e are Multiclass, we use the coerce function:

Xc = coerce(X, :b=>Count, :d=>Multiclass, :e=>Multiclass)
schema(Xc)

which prints

┌───────┬───────────────────────────────┬────────────────────────────────────────────────┐
│ names │ scitypes                      │ types                                          │
├───────┼───────────────────────────────┼────────────────────────────────────────────────┤
│ a     │ Continuous                    │ Float64                                        │
│ b     │ Union{Missing, Count}         │ Union{Missing, Int64}                          │
│ c     │ Count                         │ Int64                                          │
│ d     │ Multiclass{2}                 │ CategoricalValue{Int64, UInt32}                │
│ e     │ Union{Missing, Multiclass{2}} │ Union{Missing, CategoricalValue{Char, UInt32}} │
└───────┴───────────────────────────────┴────────────────────────────────────────────────┘

Acknowledgements and history

ScientificTypes is based on code from MLJScientificTypes.jl (now deprecated) and in particular builds on contributions of Anthony Blaom (@ablaom), Thibaut Lienart (@tlienart), Samuel Okon (@OkonSamuel), and others not recorded in the ScientificTypes commit history.

ScientificTypes.jl 2.0 implements the DefaultConvention, which coincides with the deprecated MLJ convention of MLJScientificTypes.jl 0.4.8. The code at ScientificTypes 1.1.2 (which defined only the API) became ScientificTypesBase.jl 1.0.