Open tlienart opened 4 years ago
There's also this: https://github.com/joshday/Telperion.jl
Continuing the discussion started by @indymnv at https://github.com/alan-turing-institute/MLJ.jl/issues/970:
Existing MLJ transformers are documented here with the exception of InteractionTransformer
, which was recently added to MLJModels, but is not documented or re-exported yet by MLJ.jl. Here's the list:
julia> using MLJModels
julia> models() do m
m.package_name == "MLJModels" &&
!m.is_supervised
end
11-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
(name = ContinuousEncoder, package_name = MLJModels, ... )
(name = FeatureSelector, package_name = MLJModels, ... )
(name = FillImputer, package_name = MLJModels, ... )
(name = InteractionTransformer, package_name = MLJModels, ... )
(name = OneHotEncoder, package_name = MLJModels, ... )
(name = Standardizer, package_name = MLJModels, ... )
(name = UnivariateBoxCoxTransformer, package_name = MLJModels, ... )
(name = UnivariateDiscretizer, package_name = MLJModels, ... )
(name = UnivariateFillImputer, package_name = MLJModels, ... )
(name = UnivariateStandardizer, package_name = MLJModels, ... )
(name = UnivariateTimeTypeToContinuous, package_name = MLJModels, ... )
A "fancier" version of InteractionTransformer
, based on R type "formulas", has been planned, but no-one has really found the time to work on it.
There is a project in progress to roll out a feature_importance
method for models that support that, with the idea of adding feature selection tools, such as recursive feature elimination.
TableTransforms.jl referenced by @juliohm is very active but not yet integrated with MLJ, although we are working towards doing so in the future (at least several months off). I think that is good place to contribute generic table transformers, such as encoders. Some feature engineering tools, such as RFE, will probably not make sense there, as they require supervised learners, for example.
@indymnv It would be helpful if you can identify specific encoders or other tools you use frequently that are missing from MLJ (or TableTransforms.jl) so they can be prioritised.
@ablaom Thanks for all the information, in general in my work with ML I use the following encoders a lot.
For categorical variables
coerce
from ScientificTypes.jlFor dates and other cyclic variables:
For some numerical variables:
UnivariateDiscretizer
Transformations:
TransformedTargetModel
wrapperUnivariateBoxCoxTransformer
Standardization and Normalization @ablaom says done - Standardizer
Feature Selection:
For now, in Julia I have only used One-Hot-encoder, I have not checked the transformations.
[Edit]: As a context, I frequently work with linear/logistic regression models, Decision-Tree, Random Forest and GBM.
Thanks @indymnv . That's most helpful. PR's for missing items welcome 😉
I stumbled upon https://github.com/matthieugomez/PairsMacros.jl today and it seems to be close to what we discussed with @vollmersj with respect to defining new columns with a formula-like syntax.
@matthieugomez sorry to ping you here but would you be interested in something like PairsMacros for general-purpose feature engineering to work with MLJ?