Open quantasm opened 4 years ago
@CESARDELATORRE FYI
@quantasm - This is related to this issue: https://github.com/dotnet/machinelearning/issues/4082 It's something we have identified and have in the backlog.
Adding @gvashishtha to follow up on this feature.
@quantasm Can you explain more about your use case? The stratification features in scikit learn both preserve the original distribution in the data. So you would get a 90%/10% split in your train and test datasets.
It seems that what you are after is balance, which would generally be achieved by up- or down-sampling. Is this true?
There does not appear to be a way to stratify data in ML.NET, is this likely to be implemented anytime soon?
Say I have data that has an uneven predictor field split 90% / 10%, I would like to cross-validate the data with k folds so that each fold will produce an even predictor split of 50% / 50% (or any desired split setting value).
This does not seem possible yet but is a major feature that is required as part of ML modelling.