dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.93k stars 1.86k forks source link

OneDAL FastForest training has an "Array dimensions exceeded supported range" exception #6927

Open 80LevelElf opened 6 months ago

80LevelElf commented 6 months ago

System Information (please complete the following information):

Describe the bug

Array dimensions exceeded supported range.   at System.Collections.Generic.List`1.set_Capacity(Int32 value)
   at System.Collections.Generic.List`1.AddWithResize(T item)
   at Microsoft.ML.OneDal.OneDalUtils.GetTrainData(IChannel channel, Factory cursorFactory, List`1& featuresList, List`1& labelsList, Int32 numberOfFeatures)
   at Microsoft.ML.Trainers.FastTree.FastForestBinaryTrainer.TrainCoreOneDal(IChannel ch, Factory cursorFactory, Int32 featureCount)
   at Microsoft.ML.Trainers.FastTree.FastForestBinaryTrainer.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.AutoML.BinaryClassificationRunner.Run(TrialSettings settings)
   at Microsoft.ML.AutoML.BinaryClassificationRunner.RunAsync(TrialSettings settings, CancellationToken ct)
   at Microsoft.ML.AutoML.AutoMLExperiment.RunAsync(CancellationToken ct)

We have found this error in internal ML.net logs a lot of times. Looks liks it's not related to train set size (in the case I have copied this error we have only 20 000 training rows)

superichmann commented 4 days ago

Still happens.. regression.

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at Microsoft.ML.Trainers.FastTree.Dataset.MapFeatureToFlockAndSubFeature(Int32 feature, Int32& flock, Int32& subfeature)
   at Microsoft.ML.Trainers.FastTree.InternalRegressionTree.PopulateThresholds(Dataset dataset)
   at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainCoreOneDal(IChannel ch, Factory cursorFactory, Int32 featureCount)
   at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)

Microsoft.ML.OneDal,0.22.0-preview.24271.1 Is there any benchmark showing that onedal with ml.net is actually faster(when it works)?