dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
4.49k stars 2.69k forks source link

The size of input lines is not consistent #717

Open bartosz-wozniak-lx opened 5 years ago

bartosz-wozniak-lx commented 5 years ago

I'm getting the following exception: System.ArgumentOutOfRangeException: The size of input lines is not consistent while I'm trying tu run sample ImageClassification.Train.

This occurs in line "ITransformer trainedModel = pipeline.Fit(trainDataView);"

Is there any fix or workaround, please?

CESARDELATORRE commented 5 years ago

Can you point to the exact sample URL you are using when having this issue?

bartosz-wozniak-lx commented 5 years ago

Yes, of course: https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_Training

thoj commented 5 years ago

Same issue here with the sample above. If I switch to the resnet arch I don't get this error but i get this output (Shortened):

Saver not created because there are no variables in the graph to restore
*** Training the image classification model with DNN Transfer Learning on top of the selected pre-trained model/architecture ***
Training with transfer learning took: 0 seconds
Phase: Bottleneck Computation, Dataset used:      Train, Image Index:   1, Image Name:
Phase: Bottleneck Computation, Dataset used:      Train, Image Index:   2, Image Name:
Phase: Bottleneck Computation, Dataset used:      Train, Image Index: 168, Image Name:
Phase: Bottleneck Computation, Dataset used:      Train, Image Index: 169, Image Name:
Phase: Bottleneck Computation, Dataset used:      Train, Image Index: 170, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:   1, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:   2, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:  24, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:  25, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:  26, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:  27, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:  28, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:  29, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:  30, Image Name:
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   0, Accuracy:  0,4882352, Cross-Entropy: 7,081955E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   0, Accuracy:        0,1
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   1, Accuracy:  0,4705882, Cross-Entropy: 6,169476E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   1, Accuracy:        0,1
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   2, Accuracy:  0,5294118, Cross-Entropy: 4,885881E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   2, Accuracy:  0,2666667
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   3, Accuracy:  0,4764706, Cross-Entropy: 4,958395E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   3, Accuracy:  0,2333333
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   4, Accuracy:  0,4941176, Cross-Entropy: 5,939554E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   4, Accuracy:        0,2
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   5, Accuracy:  0,4882352, Cross-Entropy: 5,615148E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   5, Accuracy:          0
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   6, Accuracy:  0,5294118, Cross-Entropy: 5,2967E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   6, Accuracy:        0,3
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   7, Accuracy:  0,5470588, Cross-Entropy: 4,233541E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   7, Accuracy:  0,2666667
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   8, Accuracy:  0,5235294, Cross-Entropy: 3,271615E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   8, Accuracy:        0,2
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:   9, Accuracy:  0,5235295, Cross-Entropy: 5,215298E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:   9, Accuracy:  0,1666667
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  10, Accuracy:  0,5117648, Cross-Entropy: 3,941893E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  10, Accuracy:  0,1666667
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  11, Accuracy:        0,4, Cross-Entropy: 5,180342E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  11, Accuracy:        0,2
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  12, Accuracy:  0,4529412, Cross-Entropy: 5,114708E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  12, Accuracy:  0,1333333
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  13, Accuracy:  0,4823529, Cross-Entropy: 4,743001E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  13, Accuracy:  0,3333333
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  14, Accuracy:  0,5529412, Cross-Entropy: 4,846893E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  14, Accuracy:        0,1
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  15, Accuracy:  0,4647059, Cross-Entropy: 5,490917E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  15, Accuracy:        0,1
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  16, Accuracy:  0,5470588, Cross-Entropy: 4,3319E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  16, Accuracy:  0,2333333
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  17, Accuracy:  0,5470588, Cross-Entropy: 5,035258E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  30, Accuracy:  0,1333333
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  31, Accuracy:  0,5529411, Cross-Entropy: 4,524144E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  31, Accuracy:        0,2
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  32, Accuracy:  0,5411765, Cross-Entropy: 3,512103E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  32, Accuracy:  0,1666667
Phase: Training, Dataset used:      Train, Batch Processed Count:  17, Epoch:  33, Accuracy:  0,5058823, Cross-Entropy: 4,79253E+12
Phase: Training, Dataset used: Validation, Batch Processed Count:   3, Epoch:  33, Accuracy:        0,2
Saver not created because there are no variables in the graph to restore
Restoring parameters from C:\Users\Tjä\Documents\Xtoolkit\ML\X_MLServer\XInceptionTrainer\bin\Debug\Xreapp2.1\custom_retrained_model_based_on_resnet_v2_101_299.meta
Froze 2 variables.
Converted 2 variables to const ops.
Making predictions in bulk for evaluating model's quality...
************************************************************
*    Metrics for TensorFlow DNN Transfer Learning multi-class classification model
*-----------------------------------------------------------
    AccuracyMacro = 0,25, a value between 0 and 1, the closer to 1, the better
    AccuracyMicro = 0,2333, a value between 0 and 1, the closer to 1, the better
    LogLoss = 26,4797, the closer to 0, the better
    LogLoss for class 1 = 34,5388, the closer to 0, the better
    LogLoss for class 2 = 11,5129, the closer to 0, the better
    LogLoss for class 3 = 25,9041, the closer to 0, the better
    LogLoss for class 4 = 23,0259, the closer to 0, the better
    LogLoss for class 5 = 34,5388, the closer to 0, the better
************************************************************
Predicting and Evaluation took: 16 seconds

Seems like the training isn't actually doing anything, the epochs go by very fast. Could maybe be related.

Here is the stack trace when trying to use the inception arch:

  HResult=0x80131502
  Message=The size of input lines is not consistent
Parameter name: Source
  Source=Microsoft.ML.Data
  StackTrace:
   at Microsoft.ML.Data.TextLoader.Bindings..ctor(TextLoader parent, Column[] cols, IMultiStreamSource headerFile, IMultiStreamSource dataSample) in E:\A\_work\656\s\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoader.cs:line 689
   at Microsoft.ML.Data.TextLoader..ctor(IHostEnvironment env, Options options, IMultiStreamSource dataSample) in E:\A\_work\656\s\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoader.cs:line 1187
   at Microsoft.ML.Transforms.ImageClassificationTransformer.GetShuffledData(String path) in E:\A\_work\656\s\src\Microsoft.ML.Dnn\ImageClassificationTransform.cs:line 300
   at Microsoft.ML.Transforms.ImageClassificationTransformer.TrainAndEvaluateClassificationLayer(String trainBottleneckFilePath, Options options, String validationSetBottleneckFilePath) in E:\A\_work\656\s\src\Microsoft.ML.Dnn\ImageClassificationTransform.cs:line 329
   at Microsoft.ML.Transforms.ImageClassificationTransformer..ctor(IHostEnvironment env, Options options, DnnModel tensorFlowModel, IDataView input) in E:\A\_work\656\s\src\Microsoft.ML.Dnn\ImageClassificationTransform.cs:line 180
   at Microsoft.ML.Transforms.ImageClassificationEstimator.Fit(IDataView input) in E:\A\_work\656\s\src\Microsoft.ML.Dnn\ImageClassificationTransform.cs:line 1520
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) in E:\A\_work\656\s\src\Microsoft.ML.Data\DataLoadSave\EstimatorChain.cs:line 67
   at ImageClassification.Train.Program.Main() in C:\Users\Tjä\Documents\Xtoolkit\ML\X_MLServer\XInceptionTrainer\Program.cs:line 474
CESARDELATORRE commented 5 years ago

Adding @codemzs - I'm not able to reproduce it. Also, we're updating to ML.NET 1.4 to be released in a couple of days. There are some changes in this API (Image Classification). You might want to try the new version soon.

codemzs commented 5 years ago

@thoj @bartosz-wozniak-lx What OS are you guys running this sample? and if you can confirm you have not altered the samples at all? or if you have then please provide the exact repro steps.

ashbhandare commented 5 years ago

@thoj @bartosz-wozniak-lx In addition, could you add the complete output log, from start to the stack trace? It will be helpful to find out where the failure occurs.

ashbhandare commented 5 years ago

One way of reproducing the error that I found is if interrupt the run when bottleneck values are being calculated, and in the subsequent run, try to reuse bottleneck values. The cached file thus created does not contain the complete dataset. In this case, the error can be resolved by either not reusing the bottleneck values, or deleting the cached files. There could be a different cause for this error as well. @thoj and @bartosz-wozniak-lx , can you confirm if your repro steps are different from this?

bartosz-wozniak-lx commented 5 years ago

@codemzs I am running this sample on Windows 10. I have tried a few different computers. It works perfectly on OSX but does not work on Windows 10. Yes, I can confirm, I have not altered the samples at all. I have just downloaded it and run it.

codemzs commented 5 years ago

@bartosz-wozniak-lx thanks! on Windows 10 were you running on dotnet core or framework?

bartosz-wozniak-lx commented 5 years ago

I have tried both (Core and framework). None of them worked.

codemzs commented 5 years ago

@bartosz-wozniak-lx in your bin folder there should be two “.csv” files that are created after a run of image classification API. Can you please send them here?