dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Spittler/consolidator worker exception while consuming data w/ 0.8.0-preview-27128-5 #1751

Closed rauhs closed 5 years ago

rauhs commented 5 years ago

Issue

I'm not sure how to further debug this. I probably won't have time for a minimal repro example.

Source code / logs

Exception:

IndexOutOfRangeException: Index was outside the bounds of the array.

                public void SetAll(OutPipe[] pipes)
                {
                    if (_ex != null)
                        throw Contracts.Except(_ex, "Splitter/consolidator worker encountered exception while consuming source data");
                    Contracts.Assert(Utils.Size(pipes) == _batchColumns.Length);
[...]

Exception trace:

   at System.ThrowHelper.ThrowIndexOutOfRangeException() in E:\A\_work\65\s\corefx\src\System.Memory\src\System\ThrowHelper.cs:line 43
   at Microsoft.ML.Transforms.Conversions.KeyToVectorMappingTransformer.Mapper.<>c__DisplayClass11_0.<MakeGetterOne>b__0(VBuffer`1& dst) in E:\A\_work\423\s\src\Microsoft.ML.Data\Transforms\KeyToVector.cs:line 483
   at Microsoft.ML.Runtime.Data.ColumnConcatenatingTransformer.Mapper.BoundColumn.<>c__DisplayClass18_0`1.<MakeGetter>b__0(VBuffer`1& dst) in E:\A\_work\423\s\src\Microsoft.ML.Data\Transforms\ColumnConcatenatingTransformer.cs:line 690
   at Microsoft.ML.Runtime.Data.DataViewUtils.Splitter.InPipe.Impl`1.Fill() in E:\A\_work\423\s\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 730
   at Microsoft.ML.Runtime.Data.DataViewUtils.Splitter.Consolidator.<>c__DisplayClass4_1.<ConsolidateCore>b__2() in E:\A\_work\423\s\src\Microsoft.ML.Data\Data\DataViewUtils.cs:line 417

Trace:

    Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.DataViewUtils.Splitter.Batch.SetAll(Microsoft.ML.Runtime.Data.DataViewUtils.Splitter.OutPipe[] pipes) Line 839  C#
    Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.DataViewUtils.Splitter.Cursor.MoveNextCore() Line 1114  C#
    Microsoft.ML.Core.dll!Microsoft.ML.Runtime.Data.RootCursorBase.MoveNext() Line 70   C#
    Microsoft.ML.Data.dll!Microsoft.ML.Transforms.Normalizers.NormalizingTransformer.Train(Microsoft.ML.Runtime.IHostEnvironment env, Microsoft.ML.Runtime.Data.IDataView data, Microsoft.ML.Transforms.Normalizers.NormalizingEstimator.ColumnBase[] columns) Line 375 C#
    Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.EstimatorChain<Microsoft.ML.Core.Data.ITransformer>.Fit(Microsoft.ML.Runtime.Data.IDataView input) Line 68  C#
    Microsoft.ML.Data.dll!Microsoft.ML.Runtime.Data.EstimatorChain<Microsoft.ML.Core.Data.ITransformer>.Fit(Microsoft.ML.Runtime.Data.IDataView input) Line 68  C#
    Microsoft.ML.Data.dll!Microsoft.ML.StaticPipe.Estimator<(Microsoft.ML.StaticPipe.Vector<float>, Microsoft.ML.StaticPipe.Key<uint, string>), (Microsoft.ML.StaticPipe.Scalar<string>, Microsoft.ML.StaticPipe.Scalar<string>, Microsoft.ML.StaticPipe.Vector<float>), Microsoft.ML.Core.Data.ITransformer>.Fit(Microsoft.ML.StaticPipe.DataView<(Microsoft.ML.StaticPipe.Vector<float>, Microsoft.ML.StaticPipe.Key<uint, string>)> view) Line 35    C#
[...]
sfilipi commented 5 years ago

Without a bit more information, i am afraid we might not be able to track this down..

ArieJones commented 5 years ago

@sfilipi I am getting the same thing using 0.9.0-preview-27129-2

I have set up a simple example with a few data items on https://github.com/ArieJones/MLNetWorkInProgress.git

Getting the following error … which happens right as I try to fit the SCDA with data .. any insight would be greatly appreciated.

System.InvalidOperationException HResult=0x80131509 Message=Event we were waiting on was subject to an exception Source=Microsoft.ML.Core StackTrace: at Microsoft.ML.Runtime.Internal.Utilities.OrderedWaiter.Wait(Int64 position, CancellationToken token) at Microsoft.ML.Data.CacheDataView.WaiterWaiter.Wait(Int64 pos) at Microsoft.ML.Data.CacheDataView.RowCursor1.MoveNext() at Microsoft.ML.Runtime.Training.TrainingCursorBase.MoveNext() at Microsoft.ML.Trainers.SdcaTrainerBase3.TrainCore(IChannel ch, RoleMappedData data, LinearPredictor predictor, Int32 weightSetCount) at Microsoft.ML.Runtime.Learners.StochasticTrainerBase2.TrainModelCore(TrainContext context) at Microsoft.ML.Runtime.Training.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at MLNetWorkInProgress.Program.Main(String[] args) in C:\Users\ArieJones\source\repos\MLNetWorkInProgress\MLNetWorkInProgress\Program.cs:line 93

Inner Exception 1: InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data

Inner Exception 2: IndexOutOfRangeException: Index was outside the bounds of the array. `

sfilipi commented 5 years ago

@ArieJones there are a few things on your pipeline that might need to get changed: 1- You are training a regression trainer here, but are using a multi-class to evaluate it. I can only see two values in your data, for the Label column. Do you have more than two? If so use the multi-class SDCA.

2- looking at your pipeline, I thing the Carrier name is a categorical value; might want to OneHot encode it, rather than to Featurize it.

3- Don't create a new Multuclass context here, use the one from mlContext

This might help.

Nevertheless the error message is not helpful. Giving it the label bug for further investigation. Permalink: https://github.com/ArieJones/MLNetWorkInProgress/tree/7ecbe97573e23b2ba8d217703106f8ad7235a24a

ArieJones commented 5 years ago

@sfilipi Thanks for the feedback! I am looking at these changes now and will let you know if I narrow it down …

ArieJones commented 5 years ago

@sfilipi Ok, so I believe the issue is with the OnHotEncoding. I went through and systematically removed columns from my Features concatenation. It seems that when a column only has one value throughout ..then it throws this error.

Additionally, if I swap out OneHotEncoding with OneHotHashEncoding then it will get past this error. So something goofy is going on in regards to the OneHotEncoding. If I get time here I will try to see if I can figure out whom the exact culprit is.

Thanks! AJ

rauhs commented 5 years ago

@ArieJones I can confirm this. Removing the OHE will not throw the exception for me.

Also, in case it helps. I don't think I misuse the API. I'm using the static API and am not evaluating anything. Just a bunch of strings to floats[]. The exact same code works fine for up to 0.7.

sfilipi commented 5 years ago

Hi @ArieJones I tried out your sample, with version 0.9.0-preview-27207-3 and the error doesn't repro. It does indeed throw with 0.8; but it seems fixed now.

I can't push my branch to your repo, authorization error, but see the pipeline changes below

` class Program { static void Main(string[] args) { var mlContext = new MLContext(0);

        //New Way Start

        TextLoader textLoader = mlContext.Data.CreateTextReader(new TextLoader.Arguments()
        {
            Separator = ",",
            HasHeader = true,
            Column = new[]
                                    {
                                                    new TextLoader.Column("InsuranceCode", DataKind.Text, 0),
                                                    new TextLoader.Column("CarrierName", DataKind.Text, 1),
                                                    new TextLoader.Column("Address1", DataKind.Text, 2),
                                                    new TextLoader.Column("Address2", DataKind.Text, 3),
                                                    new TextLoader.Column("Zip", DataKind.Text, 4),
                                                    new TextLoader.Column("DefaultProfileType", DataKind.Text, 5),
                                                    new TextLoader.Column("CarrierId", DataKind.Text, 6),
                                                    new TextLoader.Column("State", DataKind.Text, 7),
                                                    new TextLoader.Column("Label", DataKind.R4, 8),
                                                }
        });

        var data = textLoader.Read(@"data.csv");

        // Step 2: Pipeline

        var transformPipeline = mlContext.Transforms.Categorical.OneHotEncoding("State")
                     .Append(mlContext.Transforms.Categorical.OneHotEncoding("DefaultProfileType"))
                     .Append(mlContext.Transforms.Categorical.OneHotEncoding("InsuranceCode"))
                     .Append(mlContext.Transforms.Categorical.OneHotEncoding("Zip"))
                     .Append(mlContext.Transforms.Text.FeaturizeText("CarrierName",
                                                                     "CarrierName",
                                                                     a =>
                                                                     {
                                                                         a.KeepDiacritics = false;
                                                                         a.KeepPunctuations = false;
                                                                         a.TextCase =
                                                                             TextNormalizingEstimator
                                                                                 .CaseNormalizationMode
                                                                                 .Lower;
                                                                         a.OutputTokens = true;
                                                                         a.VectorNormalizer =
                                                                             TextFeaturizingEstimator
                                                                                 .TextNormKind.L2;
                                                                     }))
                     .Append(mlContext.Transforms.Concatenate("Address",
                                                              "Address1",
                                                              "Address2"))
                     .Append(mlContext.Transforms.Text.FeaturizeText("Address",
                                                                     "Address",
                                                                     a =>
                                                                     {
                                                                         a.KeepDiacritics = false;
                                                                         a.KeepPunctuations = false;
                                                                         a.TextCase =
                                                                             TextNormalizingEstimator
                                                                                 .CaseNormalizationMode
                                                                                 .Lower;
                                                                         a.OutputTokens = true;
                                                                         a.VectorNormalizer =
                                                                             TextFeaturizingEstimator
                                                                                 .TextNormKind.L2;
                                                                     }))
                     .Append(mlContext.Transforms.Concatenate("Features",
                                                              "CarrierName",
                                                              "Address",
                                                              "Zip",
                                                              "State",
                                                              "DefaultProfileType",
                                                              "InsuranceCode"));

        var learner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(
                    labelColumn: DefaultColumnNames.Label, 
                    featureColumn: DefaultColumnNames.Features, 
                    maxIterations: 100);

        var transformedData = transformPipeline.Fit(data).Transform(data);

        var transform = learner.Fit(transformedData);
        var scoredData = transform.Transform(transformedData);

        var metrics = mlContext.MulticlassClassification.Evaluate(scoredData, "Label");

        PrintClassificationMetrics("XRef", metrics);
    }

    private static void PrintClassificationMetrics(string name, Microsoft.ML.Data.MultiClassClassifierMetrics metrics)
    {
        Console.WriteLine($"*************************************************");
        Console.WriteLine($"*       Metrics for {name}          ");
        Console.WriteLine($"*------------------------------------------------");
        Console.WriteLine($"*       Accuracy Macro: {metrics.AccuracyMacro}");
        Console.WriteLine($"*       Accuracy Micro: {metrics.AccuracyMicro}");
        Console.WriteLine($"*       Log Loss: {metrics.LogLoss}");
        Console.WriteLine($"*       Log Loss Reduction: {metrics.LogLossReduction}");
        Console.WriteLine($"*       Per Class Log Loss: {metrics.PerClassLogLoss}");
        Console.WriteLine($"*************************************************");
    }
}

`

The outputted metrics for this pipeline (100 iterations)


I am closing this one, but please feel free to reopen if you see problems.

bhrnjica commented 4 years ago

I got a similar exception on ImageClassification.Train example with custom image dataset, nearly 5000 images with JPG extensions. The pipeline is the same as in the provided example (machinelearning-samples/ImageClassification.Train).

I am using the latest 1.3.1 version of ML.NET.

However, I have another imageset with nearly 100 images with png extensions and the example works witout any exceptions. Could it be the problem with JPG image extension?

The following exception thrown:

Training the ML.NET classification model
########################################

EXCEPTION
#########
System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data ---> System.ArgumentException: Parameter is not valid.
   at System.Drawing.Bitmap..ctor(String filename, Boolean useIcm)
   at Microsoft.ML.Data.ImageLoadingTransformer.Mapper.<>c__DisplayClass3_0.<MakeGetter>b__0(Bitmap& dst)
   at Microsoft.ML.Transforms.Image.ImageResizingTransformer.Mapper.<>c__DisplayClass3_0.<MakeGetter>b__1(Bitmap& dst)
   at Microsoft.ML.Transforms.Image.ImagePixelExtractingTransformer.Mapper.<>c__DisplayClass5_0`1.<GetGetterCore>b__1(VBuffer`1& dst)
   at Microsoft.ML.Transforms.TensorFlowTransformer.TensorValueGetterVec`1.GetTensor()
   at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.UpdateCacheIfNeeded(Int64 position, ITensorValueGetter[] srcTensorGetters, String[] activeOutputColNames, OutputCache outputCache)
   at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.<>c__DisplayClass9_0`1.<MakeGetter>b__4(VBuffer`1& dst)
   at Microsoft.ML.Data.DataViewUtils.Splitter.InPipe.Impl`1.Fill()
   at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass5_1.<ConsolidateCore>b__2()
   --- End of inner exception stack trace ---
   at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes)
   at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore()
   at Microsoft.ML.Data.RootCursorBase.MoveNext()
   at Microsoft.ML.Trainers.TrainingCursorBase.MoveNext()
   at Microsoft.ML.Trainers.LbfgsTrainerBase`3.TrainCore(IChannel ch, RoleMappedData data)
   at Microsoft.ML.Trainers.LbfgsTrainerBase`3.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at ImageClassification.Model.ModelBuilder.BuildAndTrain(IEnumerable`1 imageSet, IEnumerable`1 testSet) in C:\sc\github\MLdotImgClassification\ImageClassification.Train\Model\ModelBuilder.cs:line 102
   at ImageClassification.Train.RealProducts.Main(String[] args) in C:\sc\github\MLdotImgClassification\ImageClassification.Train\RealProducts_Train.cs:line 54