JuliaAI / MLJModels.jl

Home of the MLJ model registry and tools for model queries and mode code loading
MIT License
80 stars 27 forks source link

Disintegration of MLJModels #244

Closed ablaom closed 3 years ago

ablaom commented 4 years ago

Maintenance of MLJModels has become increasingly burdensome for several reasons. Perhaps the biggest problem is that it's centralised approach to providing model API implementations ("glue code") means:

While https://github.com/JuliaLang/Pkg.jl/issues/1285 may help with the second and third issue, I don't think that is close to being resolved. We have also observed elsewhere that code loading using Requires.jl can be slower than otherwise. (And there is https://github.com/alan-turing-institute/MLJModels.jl/issues/243)

While the plan has always been for all algorithm-providing packages to implement their MLJ interfaces natively, this is not going to happen quickly. In the meantime, it would be good to address the issues above.

In discussions of the core team, @tlienart has suggested the following remedy: Move the glue code for each package X into its own repository Xglue, with its own testing, and make X an ordinary dependency of Xglue. (The package Xglue would be purely a "utility" package and essentially invisible to general users.)

Such migrations could be performed incrementally and I believe each migration would trigger only a patch release, basically because the model registry (which tracks where to find glue code) is part of MLJModels (and so is opaque to MLJ).

I think this a good idea and propose beginning this disintegation; See TODO list. cc @DilumAluthge

ablaom commented 4 years ago

See also @tlienart 's earlier post at https://github.com/alan-turing-institute/MLJ.jl/issues/276

tlienart commented 4 years ago

Great, and just to stress it, we don't want this to be the rule, we want this to be a temporary solution and hope that the packages for which we end up writing glue code will integrate and own said glue code (we understand that some of these are slow moving packages with a lot of legacy and may be slow integrating stuff...)

We definitely do not want other package devs with brand new cool packages to suggest a glue package. They should just own the interface like ParallelKMeans, EvoTrees or MLJLinearModels

DilumAluthge commented 4 years ago

FWIW, we now have support for storing multiple registered Julia packages inside subdirectories a single Git repository.

So e.g. if I already have a Git repository for my package MyPackage.jl, I can now make a subdirectory inside that repository, and inside that subdirectory I can store a package called MyPackageGlue.jl.

DilumAluthge commented 4 years ago

FWIW, we now have support for storing multiple registered Julia packages inside subdirectories a single Git repository.

So e.g. if I already have a Git repository for my package MyPackage.jl, I can now make a subdirectory inside that repository, and inside that subdirectory I can store a package called MyPackageGlue.jl.

This works exactly the same as if MyPackage.jl and MyPackageGlue.jl are stored in different Git repositories. But I figure that at least some people will find it easier/more convenient to store MyPackage.jl and MyPackageGlue.jl in the same Git repository.

tlienart commented 4 years ago

that's great! though here I think that having separate repos has the advantage that when eventually the original package (e.g. MultivariateStats) owns the glu code, we can just archive the repo whereas here you'd have to remove a subrepos which might be messier to maintain overall (?)

DilumAluthge commented 4 years ago

So it's not a subrepo or anything fancy. It's just a regular folder. So if you want to move the glue code to the main package, just rename and move the files.

DilumAluthge commented 4 years ago

Does that explanation make sense?

Here's an example repo with two packages: https://github.com/DilumAluthge/FooBar

There's nothing fancy, just regular folders. If you want to move the Bar package code into the Foo package code, just move/rename the files. If you want to delete the Bar package entirely, just delete the files.

ablaom commented 4 years ago

@DilumAluthge How do release notes work in a Repo hosting multiple packages?

DilumAluthge commented 4 years ago

The idea is that you would make tags that look like this:

In the appropriate tag, you have release notes for the relevant package only.

DilumAluthge commented 4 years ago

You can have any system you want of course. That is just a suggestion.

tlienart commented 4 years ago

I think this is cool but I also think that in this particular case having separate repos is simple & would help see clearly where the code ought to be & the git history, I have a feeling this will cause more headaches especially for new maintainers.

Our end goal here is that this code ends up outside MLJModels and is integrated in other packages so I think having subfolders (I understand it's not subrepos) just does not seem to be the right approach here even if it's fully supported by Pkg.

DilumAluthge commented 4 years ago

Yeah, given the long term goal here, I think it makes sense to keep these "glue" packages in separate Git repositories.

ablaom commented 4 years ago

Packages with implementation code to migrate out to separate package:

Those marked with * are candidates for algorithm-providing package to host model implementation code.

Others get migrated to new package called "MLJPackageInterface".

Important

The package_name trait should stay the same but the load_path traits will need to reflect new location of the model implementation code. For example:

name = "Birch",
package_name = "ScikitLearn",
load_path = "MLJScikitLearnInterface.Birch"

cc @tlienart

DilumAluthge commented 4 years ago

The MLJ.jl README has a very nice visualization of the dependency graph, stored in the repo at https://github.com/alan-turing-institute/MLJ.jl/blob/master/material/MLJ_stack.svg

As we break up MLJModels and MLJBase, we should definitely remember to keep this image up-to-date.

ablaom commented 3 years ago

All done 🎉

Thanks to all who helped in this large project, especially @OkonSamuel @tlienart and @ExpandingMan