dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Splitter/consolidator worker encountered exception while consuming source data (DataClassification AutoML) #2591

Open rzechu opened 1 year ago

rzechu commented 1 year ago

System Information (please complete the following information):

Describe the bug I have seen few issues regarding similliar error but all of them regards ImageClassification. Mine regards DataClassification

DataClassification using SQL Server View

To Reproduce

  1. DataClassification
  2. Lot of columns (date, decimals, ints) 2 varchar and 1 label is fine
  3. Lot of columns (date, decimals, ints) 2 varchar and 1 label and 1 more problematic varchar columns (StringCol3) instant - error
  4. If i replace this problematic varchar column with constants for all rows something like 'abc' as [StringCol3] training is fine

Can't attach those columns due to sensitive data. But I anoymyzed and trimmed enought data to reproduce (attached SQL scripts to create table and insert records) This should be enough to reproduce this error. aibug.zip

start multiclass classification
Evaluate Metric: MacroAccuracy
Available Trainers: SDCA,LBFGS,LGBM,FASTTREE,FASTFOREST
Training time in second: 300
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=OVA; Fitting, Kind=Info] Training learner 0
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=Converter; InitDataset, Kind=Info] Making per-feature arrays
[Source=AutoMLExperiment-ChildContext, Kind=Info] [Source=Converter; InitBoundariesAndLabels, Kind=Info] Changing data from row-wise to column-wise

Splitter/consolidator worker encountered exception while consuming source data

   at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes)
   at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore()
   at Microsoft.ML.Data.RootCursorBase.MoveNext()
   at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext()
   at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances)
   at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
   at Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
   at Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit)
   at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase`3.ConvertData(RoleMappedData trainData)
   at Microsoft.ML.Trainers.FastTree.FastTreeBinaryTrainer.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Trainers.OneVersusAllTrainer.TrainOne(IChannel ch, ITrainerEstimator`2 trainer, RoleMappedData data, Int32 cls)
   at Microsoft.ML.Trainers.OneVersusAllTrainer.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.AutoML.SweepablePipelineRunner.Run(TrialSettings settings)
   at Microsoft.ML.AutoML.SweepablePipelineRunner.RunAsync(TrialSettings settings, CancellationToken ct)
   at Microsoft.ML.AutoML.AutoMLExperiment.<RunAsync>d__24.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.MultiClassificationExperiment.<ExecuteAsync>d__14.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/MultiClassificationExperiment.cs:line 123
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.AutoMLEngine.<StartTrainingAsync>d__21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 145

BONUS if you make a View, replace 1 column with empty/null and use this View for DataClassification

CREATE OR ALTER VIEW XYZ 
AS
SELECT
 .....
, RIGHT(StringCol3,0) AS StringCol3
.....
FROM AIBug

You will get another error

Schema mismatch for input column 'StringCol3_CharExtractor': expected Expected known-size vector of Single, got Vector<Single>
Parameter name: inputSchema

Expected behavior No error? Or human readable information what is wrong and how to fix it.

Screenshots, Code, Sample Projects If applicable, add screenshots, code snippets, or sample projects to help explain your problem.

LittleLittleCloud commented 1 year ago

@rzechu

It's because some datetime columns in your sql data has null value. I remove all datetime type columns and train successfully.

We'll fix that issue in ML.Net. In the meantime, as a workaround, you can either ignore datetime columns or set those type to be single --- which won't effect the training result because datetime is parsed as single in training pipeline as well

rzechu commented 1 year ago

Thank you for quick reply I have done workaround using DateDiff(d, datecol, Getdate()) and nullchecking Have you checked 2nd issue?

BONUS if you make a View, replace 1 column with empty/null and use this View for DataClassification

CREATE OR ALTER VIEW XYZ 
AS
SELECT
 .....
, RIGHT(StringCol3,0) AS StringCol3
.....
FROM AIBug

You will get another error

Schema mismatch for input column 'StringCol3_CharExtractor': expected Expected known-size vector of Single, got Vector<Single>
Parameter name: inputSchema
LittleLittleCloud commented 1 year ago

Still investigating