dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

AutoML text classification experiment fails with specific text in data file #910

Closed jamiefutch closed 3 years ago

jamiefutch commented 4 years ago

System Information (please complete the following information):

Describe the bug AutoML binary classification experiment fails when the following text is a feature column: i love going here

To Reproduce Steps to reproduce the behavior:

  1. use this data as data file: Id text class 863 i love going here 1 794 excellent 1 802 good 1 805 good contacts 1 806 awesome 1 807 good 1 808 good 1 809 good 1 810 good 1 811 love new location 1 813 nice professional 1 814 new facility nice 1 817 very good 1 818 very good 1 819 very good 1 830 bad 0 840 stupid person 0

  2. AutoML -> Text Classification columns: id: ignore text: feature class: label

  3. predict class column

  4. See error:

Training failed with the exception: System.ArgumentOutOfRangeException: Could not find feature column 'Features' Parameter name: inputSchema at Microsoft.ML.Trainers.TrainerEstimatorBase2.CheckInputSchema(SchemaShape inputSchema) at Microsoft.ML.Trainers.TrainerEstimatorBase2.GetOutputSchema(SchemaShape inputSchema) at Microsoft.ML.Data.EstimatorChain1.GetOutputSchema(SchemaShape inputSchema) at Microsoft.ML.Data.EstimatorChain1.GetOutputSchema(SchemaShape inputSchema) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) at Microsoft.ML.AutoML.Experiment2.Execute() at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator1 preFeaturizer, IProgress1 progressHandler, IRunner1 runner) at Microsoft.ML.AutoML.ExperimentBase2.ExecuteCrossValSummary(IDataView[] trainDatasets, ColumnInformation columnInfo, IDataView[] validationDatasets, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.<>cDisplayClass21_0.b_5() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 81 at System.Threading.Tasks.Task1.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.d21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 108 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLEngine.d_30.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 147

Expected behavior Model is trained and tested

Screenshots N/A

Additional context Log: 2020-07-21 18:18:00.7024 DEBUG Disposing TrainSession (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-07-21 18:18:00.7024 DEBUG Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-07-21 18:18:00.7044 DEBUG Disposing TrainSession (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-07-21 18:18:00.7274 INFO | Trainer MicroAccuracy MacroAccuracy Duration #Iteration | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2020-07-21 18:18:00.7614 INFO Could not find input column 'Features' Parameter name: inputSchema (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2020-07-21 18:18:00.7614 INFO Could not find input column 'Features' Parameter name: inputSchema (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2020-07-21 18:18:00.7724 INFO Could not find feature column 'Features' Parameter name: inputSchema (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2020-07-21 18:18:00.7724 DEBUG Training failed with the exception: System.ArgumentOutOfRangeException: Could not find feature column 'Features' Parameter name: inputSchema at Microsoft.ML.Trainers.TrainerEstimatorBase2.CheckInputSchema(SchemaShape inputSchema) at Microsoft.ML.Trainers.TrainerEstimatorBase2.GetOutputSchema(SchemaShape inputSchema) at Microsoft.ML.Data.EstimatorChain1.GetOutputSchema(SchemaShape inputSchema) at Microsoft.ML.Data.EstimatorChain1.GetOutputSchema(SchemaShape inputSchema) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) at Microsoft.ML.AutoML.Experiment2.Execute() at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator1 preFeaturizer, IProgress1 progressHandler, IRunner1 runner) at Microsoft.ML.AutoML.ExperimentBase2.ExecuteCrossValSummary(IDataView[] trainDatasets, ColumnInformation columnInfo, IDataView[] validationDatasets, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.<>cDisplayClass21_0.b_5() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 81 at System.Threading.Tasks.Task1.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.d21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 108 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLEngine.d_30.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 147 (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)

beccamc commented 4 years ago

@jamiefutch Sorry you've experienced a problem and thanks for reporting! Just to clarify, is your data separated by tabs or spaces? Is the separator between 863 and i love going here a full tab?

jamiefutch commented 4 years ago

@beccamc Sorry, left that out. The values are tab separated. The offending line/string is: 863 i love going here 1 e.g. (escaped): 863\ti love going here\t1

beccamc commented 3 years ago

After the refactor we still have an error with this dataset image.png

image.png

I need to dig into this further.

beccamc commented 3 years ago

I've confirmed this works. With this dataset image.png