JuliaAI / MLJModels.jl

Home of the MLJ model registry and tools for model queries and mode code loading
MIT License
78 stars 27 forks source link

Add PartitionedLS model to the registry #552

Closed boborbt closed 2 months ago

boborbt commented 3 months ago

Hi, I would like to have PartitionedLS added to the registry.

A PartitionedLS model solves a constrained least squared problem where the features are partitioned by the user into groups. The solution is constrained to have all features in a group contributing in the same "direction" (i.e., positively or negatively) to the solution. Also, the solution is given in terms of two kind of variables: $\beta$ variables specify how much and in which direction each group contributes to the solution, $\alpha$ variables specify how much each feature contributes to each group.

The implementation is here. It supports MLJ interface since version 1.0.0.

The code supporting the MLJ interface can be found here.

More information about the technique can be found in this paper (currently submitted to the Machine Learning Journal after a major revision).

ablaom commented 2 months ago

Great to have this contribution, thanks.

A cursory look at the code suggests this looks good. Can you please bring the the hyperparmeter struct docstring up to spec?

@ablaom To do:

ablaom commented 2 months ago

Waiting for:

boborbt commented 2 months ago

Hi,

A cursory look at the code suggests this looks good. Can you please bring the the hyperparmeter struct docstring up to spec?

I've tried to follow the spec, but I was not able to use the $(MLJModelInterface.doc_header(...)) call. For some reason if I include it precompiling the package fails asserting that MLJModelInterface is not defined (regardless to importing or using it).

To overcome the problem I updated the docstring for PartLS to match the required format. I left the docstring I was trying to use at the end of the file, with the only modification that $(MLJModelInterface... is commented out. If you have any suggestion about how to fix the problem I will be glad to incorporate it.

You can find the updated files in commit bcafc93.

boborbt commented 2 months ago

Waiting for:

This has been fixed in commit 6b9a3f7.

ablaom commented 2 months ago

Thanks @boborbt for that. Would you consider updating the docstring as indicated above?

ablaom commented 2 months ago

Okay, I see this is done, many thanks! Please see my proposed update, https://github.com/ml-unito/PartitionedLS.jl/pull/7.

boborbt commented 2 months ago

Thanks @boborbt for that. Would you consider updating the docstring as indicated above?

As I mentioned above, I tried my best, but was not able to use the $(MLJModelInterface.doc_header(...)) call. It breaks compilation and I could not figure out why -- the error message says it cannot find MMI (which is defined as an alias for MLJModelInterface).

As a workaround I reported the docstring as a "normal" docstring for the struct. If you have any hint about to overcome the compilation problem I am more than willing to incorporate the fix.

Thanks.

ablaom commented 2 months ago

I've tried to follow the spec, but I was not able to use the $(MLJModelInterface.doc_header(...)) call. For some reason if I include it precompiling the package fails asserting that MLJModelInterface is not defined (regardless to importing or using it).

Sorry, I completely missed this.

I don't think we need to worry about including the MLJModelInterface.doc_header call; it's just a convenience macro. It is used without problems for dozens of other models; sorry I can't guess the issue and don't have the bandwidth to investigate just now.

Now you have made Float64 generic, I suggest we amend the docstring as follows:

- `X`: any matrix with element type `Float64`, or any table with columns of type `Float64`

is replaced with

- `X`: any matrix or table with `Continuous` element scitype. Check column scitypes of a table `X` with `schema(X)`.

- `y`: any vector with `Continuous` element scitype. Check scitype with `scitype(y)`. 

Then I think we're ready to roll.

boborbt commented 2 months ago

I just updated the documentation as suggested (see commit b1ca7ba9).

By the way, thanks for all the suggestions you gave me. It was worth registering to MLJ just for the free code review it comes with it 😉.

boborbt commented 2 months ago

Any news about this issue? I believe everything needed is in place now.

ablaom commented 2 months ago

Sorry, I posted an comment but must have forgotten to click "comment".

I'm waiting for you to tag a new release to make the doc strings live. The registration process grabs the doc strings from the latest tagged release to include in the registry metadata (which is searchable from MLJ without loading model-providing code) and appears here also.

Please ping me when this is ready.

boborbt commented 2 months ago

No problem @ablaom, I just tagged version 1.0.4, registered on Julia general registry and generated a new release on github. The tagged version is this one.

boborbt commented 2 months ago

Fixed the issue you opened yesterday.

boborbt commented 2 months ago

I would like to add support for PrecompileTools, but to do so I would need to add a workload including fitting the whole model which implies using MLJBase.machine. This makes MLJBase a dependency of the package, which was frown upon in one of your previous messages.

What do you suggest? Do I drop support of PrecompileTools? Do I add MLJBase to the dependencies? Do you see any way to add it as a non-hard depenedency?

If you need to have a look, you find the candidate code in branch precompile-tools.

ablaom commented 2 months ago

I suggest you only add a precompile load for your "native" API. If your native API depends on MLJBase to work, then we ought to be able to fix that. For example, if you use MLJModelInterface.matrix (which won't run properly without MLJBase loaded) then you can use import Tables; Tables.matrix instead. Let me know what causes you issues.

There is a known issue preventing precompilation loads for MLJBase having much impact: https://github.com/JuliaAI/MLJBase.jl/pull/900 . This has to do with the way we made MLJBase an "optional" dependency for model-providing packages before Julia introduced weak dependencies and package extensions (and recompilation workloads) to do this. And we can't "modernise" without breaking MLJModelInterface, which we really don't want to do right now.

I think this issue would carry over to your case: If you make MLJBase a hard dependency, then you won't actually realize much benefit.

I'm no expert on this issue, but this is my general impression.

ablaom commented 2 months ago

https://alan-turing-institute.github.io/MLJ.jl/dev/models/PartLS_PartitionedLS/

ablaom commented 2 months ago

Thanks @boborbt for your patience with the model adding process. It was helpful to have the prompt responses to my suggestions to get this sorted quickly. Congratulations again on the new package.

If you have not already done so, you may like to announce it here at some point.