JuliaAI / MLJModels.jl

Home of the MLJ model registry and tools for model queries and mode code loading
MIT License
80 stars 27 forks source link

Interaction transformer #476

Closed olivierlabayle closed 1 year ago

olivierlabayle commented 2 years ago

Hi,

For my project I need to be able to generate interactions of variables up to a specific order. I couldn't find an implementation yet in the Julia ecosystem so have a custom implementation. I was wondering if that would be of general interest and if this package is the correct place to share it?

For now it looks like the following:

using MLJBase
using Combinatorics
using Tables

mutable struct InteractionTransformer <: Static 
    order::Int
    colnames::Union{Nothing, Vector{Symbol}}
end
InteractionTransformer(;order=2, colnames=nothing) = InteractionTransformer(order, colnames)

interactions(columns, order) = 
    collect(Iterators.flatten(combinations(columns, i) for i in 2:order))

actualcolumns(colnames::Nothing, table) = Tables.columnnames(table)

function actualcolumns(colnames::Vector{Symbol}, table)
    diff = setdiff(model.colnames, Tables.columnnames(table))
    diff != [] && throw(ArgumentError(string("Columns ", join([x for x in diff], ", "), " are not in the dataset")))
    return colnames
end

function interaction(columns, variables...)
    .*((Tables.getcolumn(columns, var) for var in variables)...)
end

function MLJBase.transform(model::InteractionTransformer, _, X)
    colnames = actualcolumns(model.colnames, X)
    interactions_ = interactions(colnames, model.order)
    interaction_colnames = Tuple(Symbol(join(inter, "_")) for inter in interactions_)
    columns = Tables.Columns(X)
    interaction_table = NamedTuple{interaction_colnames}([interaction(columns, inter...) for inter in interactions_])
    return merge(Tables.columntable(X), interaction_table)
end

X = (A = [1, 2, 3], B = [4, 5, 6], C = [7, 8, 9])

model = InteractionTransformer(order = 3)

MLJBase.transform(model, nothing, X)

If this is welcome I am happy to start a PR and take specific guidance if any.

ablaom commented 2 years ago

I tried this out, and this looks great!

A related feature request is to add transformers based on an R-style "formula", but this has been sitting around a while and is more complicated to implement.

I would enthusiastically support a PR to MLJModels (code would go here) and can provide guidance. I should mention that I hope to ultimately integrate TablesTransforms.jl, so it might make sense to make your contribution there. On the other hand, this integration requires us to remove the MLJ type hierarchy, something I'm working on but is months away.

ablaom commented 1 year ago

closed as completed