JuliaAI / MLJ.jl

A Julia machine learning framework
https://juliaai.github.io/MLJ.jl/
Other
1.79k stars 158 forks source link

Large models name change in BetaML #963

Closed sylvaticus closed 2 years ago

sylvaticus commented 2 years ago

Hello, just to preannounce a large name change in BetaML model names, in order to increase consistency.

These are in master, but not yet on any release, so still time to amend them:

BetaML name MLJ Interface
PerceptronClassifier LinearPerceptron
KernelPerceptronClassifier KernelPerceptron
PegasosClassifier Pegasos
DecisionTreeEstimator DecisionTreeClassifier, DecisionTreeRegressor
RandomForestEstimator RandomForestClassifier, RandomForestRegressor
NeuralNetworkEstimator NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifier
GMMRegressor1
GMMRegressor2 GaussianMixtureRegressor
KMeansClusterer KMeans
KMedoidsClusterer KMedoids
GMMClusterer GaussianMixtureClusterer
FeatureBasedImputer SimpleImputer
GMMImputer GaussianMixtureImputer
RFImputer RandomForestImputer
UniversalImputer GeneralImputer
MinMaxScaler
StandardScaler
Scaler
PCA
OneHotEncoder
OrdinalEncoder

Note that the NN models, in their default formulation, do not require autodiff. Also, I am starting to document the various MLJ models in detail using standard struct docstring, if I interpreted the warning message correctly.

ablaom commented 2 years ago

Also, I am starting to document the various MLJ models in detail using standard struct docstring, if I interpreted the warning message correctly.

Yes, you are interpreting this correctly. BTW, our technical writer has gotten quite busy on other packages, so if you could do these yourself, that would be awesome. A couple of notes:

Tag me in your PR's and I will double check.

bear-jordan commented 2 years ago

Hi @sylvaticus, I am happy to help update docstrings if you need a hand. My email is included on my profile page, so feel free to contact me there

sylvaticus commented 2 years ago

Hi there, I finally released BetaML v0.9 with the new and renamed MLJ interface models. Concerning MLJ interface models documentation, I haven't touched anything not to make confusion with the work of @bear-jordan. Anyhow, new doc can go on v0.9.1 as is non-breaking. @ablaom: let me know if I can already update the page https://alan-turing-institute.github.io/MLJ.jl/dev/list_of_supported_models/ or you need first to do something to account for this new BetaML version)

bear-jordan commented 2 years ago

Hi Antonello,

I will share some progress on the docs this week. Had to travel for work last week, so things got a bit hectic.

Best, Bear

Bear Jordan MS Geology and Data Analytics Georgia Institute of Technology LinkedIn http://www.linkedin.com/in/bear-jordan

On Sun, Oct 2, 2022 at 8:05 AM Antonello Lobianco @.***> wrote:

Hi there, I finally released BetAML v0.9 with the new and renamed MLJ interface models. Concerning MLJ interface models documentation, I haven't touched anything not to make confusion with the work of @bear-jordan https://github.com/bear-jordan. Anyhow, new doc can go on v0.9.1 as is non-breaking. @ablaom https://github.com/ablaom: let me know if I can already update the page https://alan-turing-institute.github.io/MLJ.jl/dev/list_of_supported_models/ or you need first to do something to account for this new BetaML version)

— Reply to this email directly, view it on GitHub https://github.com/alan-turing-institute/MLJ.jl/issues/963#issuecomment-1264627217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AREFPKNK5VJ3R6QRMQYIBT3WBF3BLANCNFSM6AAAAAAQIZD5XI . You are receiving this because you were mentioned.Message ID: @.***>

ablaom commented 2 years ago

@sylvaticus Looks like I need to update the MLJ model registry first. Will get onto that today.

ablaom commented 2 years ago

While doing this I noticed some traces of "MissingImputator" which I'm guessing is an obsolete model from BetaML ?

using MLJModels # after updating registry
julia> models() do m
       m.package_name == "unknown"
       end
1-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
 (name = MissingImputator, package_name = unknown, ... )

MLJModels builds the registry by looking for all subtypes of Model.

Perhaps BetaML has some residual obsolete code somewhere?

sylvaticus commented 2 years ago

Yes, I left it as deprecated with the intention to remove it in BetaML 0.9.

ablaom commented 2 years ago

Okay. Looks like only the package metadata for that model been changed, which probably won't break much, if anything. And your release is tagged breaking anyway.

julia> info("MissingImputator")
(name = "MissingImputator",
 package_name = "unknown",
 is_supervised = false,
 abstract_type = MLJModelInterface.Unsupervised,
 deep_properties = (),
 docstring = "Impute missing values using an Expectation-Maximis...",
 fit_data_scitype =
     Tuple{ScientificTypesBase.Table{<:AbstractVector{<:Union{Missing, ScientificTypesBase.Continuous}}}},
 human_name = "missing imputator",
 hyperparameter_ranges =
     (nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing, nothing),
 hyperparameter_types = ("Int64",
                         "AbstractVector{Float64}",
                         "Symbol",
                         "Float64",
                         "Float64",
                         "Float64",
                         "String",
                         "BetaML.Api.Verbosity",
                         "Random.AbstractRNG"),
 hyperparameters = (:K,
                    :initial_probmixtures,
                    :mixtures,
                    :tol,
                    :minimum_variance,
                    :minimum_covariance,
                    :initialisation_strategy,
                    :verbosity,
                    :rng),
 implemented_methods = [:fit, :transform],
 inverse_transform_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:Union{Missing, ScientificTypesBase.Continuous}}},
 is_pure_julia = false,
 is_wrapper = false,
 iteration_parameter = nothing,
 load_path = "BetaML.Imputation.MissingImputator",
 package_license = "unknown",
 package_url = "unknown",
 package_uuid = "unknown",
 predict_scitype = ScientificTypesBase.Unknown,
 prediction_type = :unknown,
 reporting_operations = (),
 reports_feature_importances = false,
 supports_class_weights = false,
 supports_online = false,
 supports_training_losses = false,
 supports_weights = false,
 transform_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}},
 input_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:Union{Missing, ScientificTypesBase.Continuous}}},
 target_scitype = ScientificTypesBase.Unknown,
 output_scitype =
     ScientificTypesBase.Table{<:AbstractVector{<:ScientificTypesBase.Continuous}})
ablaom commented 2 years ago

This PR completes the model registry update: https://github.com/JuliaRegistries/General/pull/69552.

Be great to get a PR for the MLJ doc change (list of models)!

sylvaticus commented 2 years ago

Done it: https://github.com/alan-turing-institute/MLJ.jl/pull/968 By the way, I don't know in your snippet why info("MissingImputator") returns that it is NOT pure Julia and the licence is unknown. On the root file of my package I "register" all the MLJ model interfaces as:

function __init__()
    MMI.metadata_pkg.(MLJ_INTERFACED_MODELS,
        name       = "BetaML",
        uuid       = "024491cd-cc6b-443e-8034-08ea7eb7db2b",     # see your Project.toml
        url        = "https://github.com/sylvaticus/BetaML.jl",  # URL to your package repo
        julia      = true,     # is it written entirely in Julia?
        license    = "MIT",    # your package license
        is_wrapper = false,    # does it wrap around some other package?
    )
end

Seems some info is not applied.

ablaom commented 2 years ago
julia> BetaML.MLJ_INTERFACED_MODELS
(LinearPerceptron, KernelPerceptron, Pegasos, DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor, KMeans, KMedoids, GaussianMixtureClusterer, SimpleImputer, GaussianMixtureImputer, RandomForestImputer, GeneralImputer, NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifier, GaussianMixtureRegressor, MultitargetGaussianMixtureRegressor)

julia> BetaML.MissingImputator in ans
false
ablaom commented 2 years ago

Closed as completed.