JuliaAI / MLJModelInterface.jl

Lightweight package to interface with MLJ
MIT License
37 stars 8 forks source link

Implementing interfaces: towards a standardised approach #20

Closed tlienart closed 3 years ago

tlienart commented 4 years ago

cc: @ablaom

While helping Yaqub & co with their interface and also thinking about #10, I realised that, currently, the way we suggest users write an interface is maybe not ideal. Unfortunately the examples we have are somewhat flawed:

So basically we don't quite nail this. I think the interface should be a standalone thing which is as easy to maintain as possible and doesn't interfere with the rest of the package. The plus side from us is that if everyone does the same thing it might also make maintenance of the model registry smoother.

I'd like to suggest one way of doing this which could become the de-facto blueprint, this is somewhat inspired from how Yaqub wrote their interface. It's maybe not exactly what is required in #10 where devs would prefer not to have to redefine structs but I think that's only a minor nuisance.

Overview of the suggestion

Devs write a submodule MLJInterface inside their package which:

A quirk of submodules and evaluation scopes makes it necessary to load this submodule in the package __init__ function.

Full working example

(I also suggest we register such an example in order to explicitly document the procedure and do so better than the current examples which are not quite representative IMO).

Package

In FooModels/src/FooModels.jl:

module FooModels
    export FooRegressor, fit, predict # --> Not MLJ related
    export MLJInterface

    abstract type AbstractFooModel end

    mutable struct FooRegressor <: AbstractFooModel
        hp1::Float64
    end

    function fit(m::FooRegressor,
                 X::AbstractMatrix{<:Real},
                 y::AbstractVector{<:Real})
        # dumb model
        X_ = hcat(X, ones(size(X, 1)))
        return abs.(X_).^(m.hp1) \ y
    end

    function predict(m::FooRegressor,
                     coefs::AbstractVector,
                     Xnew::AbstractMatrix)

        Xnew_ = hcat(Xnew, ones(size(Xnew, 1)))
        ypred = abs.(Xnew_).^(m.hp1) * coefs
        return ypred
    end

    # NOTE: this **must** be in __init__ to avoid clashes (eval in submodule)
    function __init__()
        include(joinpath(@__DIR__, "MLJInterface.jl"))
    end
end

Interface submodule

In FooModels/src/MLJInterface.jl:

module MLJInterface

    import ..FooModels # Mandatory

    import MLJModelInterface      # Mandatory
    const MMI = MLJModelInterface # Optional (convenience)

    #
    # NOTE: NO EXPORTS!
    #
    # --------------------------------------------------------------------
    #
    # Structs must be re-defined in general because we need to specify
    # constraints on the hyperparameters (`clean!`) + need to specify how
    # they subtype MLJTypes.
    # This is not needed if the dev is happy to have the integration
    # baked-in like for EvoTrees.
    #

    MMI.@mlj_model mutable struct FooRegressor <: MMI.Deterministic
        hp1::Float64 = 1.5::(_ > 0)
    end

    # NOTE: this creates a copy of the original struct so that the original
    # fit from the package can be called. (see fit and predict)

    function copy_model(m::FooRegressor)
        fieldvalues = (getfield(m, n) for n in fieldnames(FooRegressor))
        m = FooModels.FooRegressor(fieldvalues...)
        return m
    end

    function MMI.fit(m::FooRegressor, verb::Int, X, y)
        Xm = MMI.matrix(X)
        # fit using original fit
        fitresult = FooModels.fit(copy_model(m), Xm, y)
        # MLJ return
        return fitresult, nothing, NamedTuple()
    end

    function MMI.predict(m::FooRegressor, fitres, Xnew)
        Xm = MMI.matrix(Xnew)
        # create an instance of the origina
        rawpred = FooModels.predict(copy_model(m), fitres, Xm)
        return rawpred
    end

    MMI.metadata_pkg(FooRegressor;
        name    = "FooModels",
        uuid    = "8bca9b16-07a7-472b-bdc5-6335965e357e",
        julia   = true,
        license = "MIT",
        is_wrapper = false)

    MMI.metadata_model(FooRegressor;
        input   = MMI.Table(MMI.Continuous),
        target  = AbstractVector{MMI.Continuous},
        weights = false,
        descr   = "Foo regressor",
        path    = "..."
        )
end

Tester script

### Script

using Pkg; Pkg.activate(".");
using FooModels, Random, Test;

Random.seed!(555)

X = randn(10, 3)
y = randn(10)

m = FooRegressor(1.5)

c = fit(m, X, y)

# =====================

import MLJBase

m = MLJInterface.FooRegressor(; hp1=1.5)

Xt  = MLJBase.table(X)
fr, = MLJBase.fit(m, 1, Xt, y)

info = MLJBase.info_dict(m)

@test fr ≈ c
@test info[:input_scitype] == MLJBase.Table(MLJBase.Continuous)
@test info[:package_name] == "FooModels"

Final notes

The above already works, the only thing is to make sure that when MLJ users try to use FooRegressor they don't have to call MLJInterface.FooRegressor or something similar but I believe this is handled by the model registry (?).

ablaom commented 4 years ago

@tlienart Thanks indeed for these comments. I think it's a super ideas to have a complete working implementation of the API by a third party DummyModels package. And I do not have any objections to the format you suggest, although:

Ideally the DummyModels package should include a DummyDeterministicClassifier as this is the one that causes the most challenges, but anything is going to be better than nothing.

ablaom commented 3 years ago

I think the newly created template repo MLJExampleInterface.jl addresses at least the gist of the issue raised here.