JuliaAI / MLJ.jl

A Julia machine learning framework
https://juliaai.github.io/MLJ.jl/
Other
1.78k stars 157 forks source link

Improvement in the Preparing Data part #964

Closed lucasmsoares96 closed 2 years ago

lucasmsoares96 commented 2 years ago

Hello guys. First I want to thank you for the work you are doing here. I'm here to suggest an improvement in the Preparing Data part of the MLJ. I missed some functionality of scikit-learn.

Topics covered in the MLJ documentation are:

In scikit-learn they are:

ablaom commented 2 years ago

Thanks for positive feedback.

Most of these are actually implemented and documented here:

https://github.com/alan-turing-institute/MLJ.jl/blob/master/docs/src/transformers.md

There is an active PR to generate polynomial features (https://github.com/JuliaAI/MLJModels.jl/pull/478).

For an up-to-date list of built-in preprocessing transformers, follow this workflow:

using MLJModels
julia> models() do m
       !m.is_supervised && m.package_name=="MLJModels"
       end
10-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
 (name = ContinuousEncoder, package_name = MLJModels, ... )
 (name = FeatureSelector, package_name = MLJModels, ... )
 (name = FillImputer, package_name = MLJModels, ... )
 (name = OneHotEncoder, package_name = MLJModels, ... )
 (name = Standardizer, package_name = MLJModels, ... )
 (name = UnivariateBoxCoxTransformer, package_name = MLJModels, ... )
 (name = UnivariateDiscretizer, package_name = MLJModels, ... )
 (name = UnivariateFillImputer, package_name = MLJModels, ... )
 (name = UnivariateStandardizer, package_name = MLJModels, ... )
 (name = UnivariateTimeTypeToContinuous, package_name = MLJModels, ... )

julia> doc("OneHotEncoder") # to get a detailed document string

Feel free to open separate request issues for missing items.