dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.99k stars 1.88k forks source link

[feature request] Dimensionality Reduction API #5183

Open wbadry opened 4 years ago

wbadry commented 4 years ago

Features Reduction

It would be great if dimensionality reduction API could be added to ML.NET. This will be a major advantage in shortening training time.

antoniovs1029 commented 4 years ago

Thanks for submitting your request. Can you please tell us which dimensionality reduction algorithms would you like to see included?

By the way, ML.NET does provide a PCA Transform: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.pcacatalog.projecttoprincipalcomponents?view=ml-dotnet

wbadry commented 4 years ago

PCA is great but from my personal experience, it could destroy classification separation in either binary or multiclass problems.

This is a toolbox made by Laurens van der Maaten in MATLAB covering many dimensionality reductions.

As a start as I am aware your queue is already full of tasks, perhaps LDA and GDA are very good candidates (I tried both with really great projected features output and decent class separation based on this projection).

This is a "work on progress" that I am on it. With projection, I can get such a nice separation based on 110 features from healthy and COVID-19 patients. CoVID Research

justinormont commented 4 years ago

@wbadry: There are various feature selection and feature compression techniques available in ML․NET.

Feature selection:

Feature compression:

You can also use the CustomMapper or Expression transforms for custom selection/compression.

Related issue: https://github.com/dotnet/machinelearning-modelbuilder/issues/702 (having AutoML try the techniques for you)