dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

CrossValidation and TrainTest for AnomalyDetection #2686

Open rogancarr opened 5 years ago

rogancarr commented 5 years ago

There are two extensions for training, TrainTestSplit and CrossValidation, that are not clearly suited for AnomalyDetection as written.

TrainTestSplit is available in AnomalyDetection as it's in the TrainerCatalogBase, but anomaly detection scenarios often have structured data (e.g. time series) that we don't handle. Do we disable TrainTestSplit for AnomalyDetection? Do we add support for some sorts of structured data different than we have now? Do we assume that all structural problems can be solved with a SamplingKeyColumn?

CrossValidation is not supported, but could be supported, should we solve the TrainTestSplit issue.

shauheen commented 5 years ago

This is a great issue and I completely agree with the suggestion, I am going to remove from Project 13 as this can be implemented without breaking change post March because we still view this as a very pertinent issue in many situations.

rogancarr commented 5 years ago

@shauheen Note that if we want to drop the TrainTestSplit API as it is currently implemented from AnomalyDetection, this will be a breaking change post-March.