Open vardeg2017 opened 5 years ago
Is this a CLI issue or AutoML issue?
@rustd. This is a problem manifesting in auto-train (i.e. the CLI). The hint for the root cause in the output is: "Exception: System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data"
Copying a bunch more rows to the dataset resolves the issue. Therefore, I believe the only problem here is that there just isn't enough data in the input file to do a proper analysis.
@greazer it would good to parse the exception and show it first in the output window so it is more discoverable. Thoughts?
@rustd Yeah, actually, I do think that auto-train should catch errors like this an return a more human friendly error. However, in this particular case (and probably for many others), it's not clear what the error means to the user. I am just suspecting that there's not enough data, therefore the AUC can't be calculated. But I'm not sure that's the only reason AUC can't be calculated. Nor would the user understand it. Therefore, it seems that the automl team should capture this type of error and return a more reasonable error to people calling it (like auto-train).
Model Builder 16.0.1905.641 OS Windows 10 Pro 17134.765 VS Studion 2019 16.1.1 I made very simple sample - XOR data set. Trying with csv format with "," seporated and tsv - no matter. Here is my tsv data set: x y z 1 0 1 0 1 1 1 1 0 0 0 0 When i choose binary-classification on a trin step i got this:b0(UInt32 stratColKey, ReadOnlyMemoryb 0(UInt32 stratColKey, ReadOnlyMemoryb5_0(InternalRegressionTree tree)
at System.Linq.Enumerable.SelectListIteratorb 5_0(InternalRegressionTree tree)
at System.Linq.Enumerable.SelectListIteratorb__0(NewCommandSettings options)
Please see the log file for more info.
Exiting ...`
Inferring Columns ... Creating Data loader ... Loading data ... Exploring multiple ML algorithms and settings to find you the best model for ML task: binary-classification For further learning check: https://aka.ms/mlnet-cli [Source=AutoML, Kind=Trace] Channel started | Trainer Accuracy AUC AUPRC F1-score Duration #Iteration | Parameter name: PosSample [Source=AutoML, Kind=Trace] Evaluating pipeline xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=AveragedPerceptronBinary{} cache=+ [Source=AutoML, Kind=Error] Pipeline crashed: xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=AveragedPerceptronBinary{} cache=+ . Exception: System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data at Microsoft.ML.Data.EvaluatorBase
1.AucAggregatorBase1.ComputeWeightedAuc(Double& unweighted) at Microsoft.ML.Data.BinaryClassifierEvaluator.<>c__DisplayClass32_0.<GetAggregatorConsolidationFuncs>b__0(UInt32 stratColKey, ReadOnlyMemory
1 stratColVal, Aggregator agg) at Microsoft.ML.Data.BinaryClassifierEvaluator.Aggregator.Finish() at Microsoft.ML.Data.EvaluatorBase1.ProcessData(IDataView data, RoleMappedSchema schema, Func
2 activeColsIndices, TAgg aggregator, AggregatorDictionaryBase[] dictionaries) at Microsoft.ML.Data.EvaluatorBase1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data) at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent
1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) at Microsoft.ML.AutoML.BinaryMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn) [Source=AutoML, Kind=Trace] 1 ҐязЁб«® 00:00:00.6932896 xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=AveragedPerceptronBinary{} cache=+ |1 AveragedPerceptronBinary ҐязЁб«® ҐязЁб«® ҐязЁб«® ҐязЁб«® 0,7 0 | System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data Parameter name: PosSample at Microsoft.ML.Data.EvaluatorBase1.AucAggregatorBase
1.ComputeWeightedAuc(Double& unweighted) at Microsoft.ML.Data.BinaryClassifierEvaluator.Aggregator.Finish() at Microsoft.ML.Data.BinaryClassifierEvaluator.<>cDisplayClass32_0.1 stratColVal, Aggregator agg) at Microsoft.ML.Data.EvaluatorBase
1.ProcessData(IDataView data, RoleMappedSchema schema, Func2 activeColsIndices, TAgg aggregator, AggregatorDictionaryBase[] dictionaries) at Microsoft.ML.Data.EvaluatorBase
1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data) at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel) at Microsoft.ML.AutoML.BinaryMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) [Source=AutoML, Kind=Trace] Evaluating pipeline xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=SdcaLogisticRegressionBinary{} cache=+ [Source=AutoML, Kind=Error] Pipeline crashed: xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=SdcaLogisticRegressionBinary{} cache=+ . Exception: System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data Parameter name: PosSample at Microsoft.ML.Data.BinaryClassifierEvaluator.<>cDisplayClass32_0.1 stratColVal, Aggregator agg) at Microsoft.ML.Data.EvaluatorBase
1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data) at Microsoft.ML.Data.EvaluatorBase1.ProcessData(IDataView data, RoleMappedSchema schema, Func
2 activeColsIndices, TAgg aggregator, AggregatorDictionaryBase[] dictionaries) at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel) at Microsoft.ML.AutoML.BinaryMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) at Microsoft.ML.Data.EvaluatorBase
1.AucAggregatorBase1.ComputeWeightedAuc(Double& unweighted) at Microsoft.ML.Data.BinaryClassifierEvaluator.Aggregator.Finish() [Source=AutoML, Kind=Trace] 2 ҐязЁб«® 00:00:06.9448234 xf=ColumnConcatenating{ col=Features:x,y} xf=Normalizing{ col=Features:Features} tr=SdcaLogisticRegressionBinary{} cache=+ |2 SdcaLogisticRegressionBinary ҐязЁб«® ҐязЁб«® ҐязЁб«® ҐязЁб«® 7,0 0 | System.ArgumentOutOfRangeException: AUC is not definied when there is no positive class in the data Parameter name: PosSample at Microsoft.ML.Data.EvaluatorBase
1.AucAggregatorBase1.ComputeWeightedAuc(Double& unweighted) at Microsoft.ML.Data.BinaryClassifierEvaluator.Aggregator.Finish() at Microsoft.ML.Data.BinaryClassifierEvaluator.<>c__DisplayClass32_0.<GetAggregatorConsolidationFuncs>b__0(UInt32 stratColKey, ReadOnlyMemory
1 stratColVal, Aggregator agg) at Microsoft.ML.Data.EvaluatorBase1.ProcessData(IDataView data, RoleMappedSchema schema, Func
2 activeColsIndices, TAgg aggregator, AggregatorDictionaryBase[] dictionaries) at Microsoft.ML.Data.EvaluatorBase1.Microsoft.ML.Data.IEvaluator.Evaluate(RoleMappedData data) at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel) at Microsoft.ML.AutoML.BinaryMetricsAgent.EvaluateMetrics(IDataView data, String labelColumn) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent
1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) [Source=AutoML, Kind=Trace] Evaluating pipeline xf=ColumnConcatenating{ col=Features:x,y} tr=LightGbmBinary{} cache=- [Source=AutoML, Kind=Error] Pipeline crashed: xf=ColumnConcatenating{ col=Features:x,y} tr=LightGbmBinary{} cache=- . Exception: System.ArgumentNullException: Value cannot be null. Parameter name: items at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName) at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length) at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.2.ToList() at System.Linq.Enumerable.ToList[TSource](IEnumerable
1 source) at Microsoft.ML.Trainers.FastTree.TreeEnsemble1..ctor(IEnumerable
1 trees, IEnumerable1 treeWeights, Double bias) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure() at Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.CreatePredictor() at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase
4.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Data.EstimatorChain
1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) [Source=AutoML, Kind=Trace] 3 ҐязЁб«® 00:00:00.1836263 xf=ColumnConcatenating{ col=Features:x,y} tr=LightGbmBinary{} cache=- |3 LightGbmBinary ҐязЁб«® ҐязЁб«® ҐязЁб«® ҐязЁб«® 0,2 0 | System.ArgumentNullException: Value cannot be null. at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName) Parameter name: items at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length) at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.<CreateTreeEnsembleFromInternalDataStructure>b__5_0(InternalRegressionTree tree) at System.Linq.Enumerable.SelectListIterator
2.ToList() at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at Microsoft.ML.Trainers.FastTree.TreeEnsemble
1..ctor(IEnumerable1 trees, IEnumerable
1 treeWeights, Double bias) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure() at Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.CreatePredictor() at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase
2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent
1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) Exception occured while exploring pipelines: Training failed with the exception: System.ArgumentNullException: Value cannot be null. Parameter name: items at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName) at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length) at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.2.ToList() at System.Linq.Enumerable.ToList[TSource](IEnumerable
1 source) at Microsoft.ML.Trainers.FastTree.TreeEnsemble1..ctor(IEnumerable
1 trees, IEnumerable1 treeWeights, Double bias) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure() at Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.CreatePredictor() at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase
4.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Data.EstimatorChain
1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) System.InvalidOperationException: Training failed with the exception: System.ArgumentNullException: Value cannot be null. Parameter name: items at System.Collections.Immutable.Requires.FailArgumentNullException(String parameterName) at System.Collections.Immutable.ImmutableArray.Create[T](T[] items, Int32 start, Int32 length) at Microsoft.ML.Trainers.FastTree.RegressionTreeBase..ctor(InternalRegressionTree tree) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.<>c.<CreateTreeEnsembleFromInternalDataStructure>b__5_0(InternalRegressionTree tree) at System.Linq.Enumerable.SelectListIterator
2.ToList() at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at Microsoft.ML.Trainers.FastTree.TreeEnsemble
1..ctor(IEnumerable1 trees, IEnumerable
1 treeWeights, Double bias) at Microsoft.ML.Trainers.FastTree.TreeEnsembleModelParametersBasedOnRegressionTree.CreateTreeEnsembleFromInternalDataStructure() at Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.CreatePredictor() at Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase4.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase
2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String labelColumn, IMetricsAgent
1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, AutoMLLogger logger) at Microsoft.ML.CLI.CodeGenerator.CodeGenerationHelper.GenerateCode() at Microsoft.ML.CLI.Program.<>c__DisplayClass1_0.