Open bartosz-wozniak-lx opened 5 years ago
Can you point to the exact sample URL you are using when having this issue?
Same issue here with the sample above. If I switch to the resnet arch I don't get this error but i get this output (Shortened):
Saver not created because there are no variables in the graph to restore
*** Training the image classification model with DNN Transfer Learning on top of the selected pre-trained model/architecture ***
Training with transfer learning took: 0 seconds
Phase: Bottleneck Computation, Dataset used: Train, Image Index: 1, Image Name:
Phase: Bottleneck Computation, Dataset used: Train, Image Index: 2, Image Name:
Phase: Bottleneck Computation, Dataset used: Train, Image Index: 168, Image Name:
Phase: Bottleneck Computation, Dataset used: Train, Image Index: 169, Image Name:
Phase: Bottleneck Computation, Dataset used: Train, Image Index: 170, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 1, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 2, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 24, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 25, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 26, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 27, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 28, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 29, Image Name:
Phase: Bottleneck Computation, Dataset used: Validation, Image Index: 30, Image Name:
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 0, Accuracy: 0,4882352, Cross-Entropy: 7,081955E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 0, Accuracy: 0,1
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 1, Accuracy: 0,4705882, Cross-Entropy: 6,169476E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 1, Accuracy: 0,1
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 2, Accuracy: 0,5294118, Cross-Entropy: 4,885881E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 2, Accuracy: 0,2666667
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 3, Accuracy: 0,4764706, Cross-Entropy: 4,958395E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 3, Accuracy: 0,2333333
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 4, Accuracy: 0,4941176, Cross-Entropy: 5,939554E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 4, Accuracy: 0,2
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 5, Accuracy: 0,4882352, Cross-Entropy: 5,615148E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 5, Accuracy: 0
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 6, Accuracy: 0,5294118, Cross-Entropy: 5,2967E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 6, Accuracy: 0,3
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 7, Accuracy: 0,5470588, Cross-Entropy: 4,233541E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 7, Accuracy: 0,2666667
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 8, Accuracy: 0,5235294, Cross-Entropy: 3,271615E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 8, Accuracy: 0,2
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 9, Accuracy: 0,5235295, Cross-Entropy: 5,215298E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 9, Accuracy: 0,1666667
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 10, Accuracy: 0,5117648, Cross-Entropy: 3,941893E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 10, Accuracy: 0,1666667
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 11, Accuracy: 0,4, Cross-Entropy: 5,180342E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 11, Accuracy: 0,2
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 12, Accuracy: 0,4529412, Cross-Entropy: 5,114708E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 12, Accuracy: 0,1333333
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 13, Accuracy: 0,4823529, Cross-Entropy: 4,743001E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 13, Accuracy: 0,3333333
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 14, Accuracy: 0,5529412, Cross-Entropy: 4,846893E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 14, Accuracy: 0,1
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 15, Accuracy: 0,4647059, Cross-Entropy: 5,490917E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 15, Accuracy: 0,1
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 16, Accuracy: 0,5470588, Cross-Entropy: 4,3319E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 16, Accuracy: 0,2333333
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 17, Accuracy: 0,5470588, Cross-Entropy: 5,035258E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 30, Accuracy: 0,1333333
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 31, Accuracy: 0,5529411, Cross-Entropy: 4,524144E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 31, Accuracy: 0,2
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 32, Accuracy: 0,5411765, Cross-Entropy: 3,512103E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 32, Accuracy: 0,1666667
Phase: Training, Dataset used: Train, Batch Processed Count: 17, Epoch: 33, Accuracy: 0,5058823, Cross-Entropy: 4,79253E+12
Phase: Training, Dataset used: Validation, Batch Processed Count: 3, Epoch: 33, Accuracy: 0,2
Saver not created because there are no variables in the graph to restore
Restoring parameters from C:\Users\Tjä\Documents\Xtoolkit\ML\X_MLServer\XInceptionTrainer\bin\Debug\Xreapp2.1\custom_retrained_model_based_on_resnet_v2_101_299.meta
Froze 2 variables.
Converted 2 variables to const ops.
Making predictions in bulk for evaluating model's quality...
************************************************************
* Metrics for TensorFlow DNN Transfer Learning multi-class classification model
*-----------------------------------------------------------
AccuracyMacro = 0,25, a value between 0 and 1, the closer to 1, the better
AccuracyMicro = 0,2333, a value between 0 and 1, the closer to 1, the better
LogLoss = 26,4797, the closer to 0, the better
LogLoss for class 1 = 34,5388, the closer to 0, the better
LogLoss for class 2 = 11,5129, the closer to 0, the better
LogLoss for class 3 = 25,9041, the closer to 0, the better
LogLoss for class 4 = 23,0259, the closer to 0, the better
LogLoss for class 5 = 34,5388, the closer to 0, the better
************************************************************
Predicting and Evaluation took: 16 seconds
Seems like the training isn't actually doing anything, the epochs go by very fast. Could maybe be related.
Here is the stack trace when trying to use the inception arch:
HResult=0x80131502
Message=The size of input lines is not consistent
Parameter name: Source
Source=Microsoft.ML.Data
StackTrace:
at Microsoft.ML.Data.TextLoader.Bindings..ctor(TextLoader parent, Column[] cols, IMultiStreamSource headerFile, IMultiStreamSource dataSample) in E:\A\_work\656\s\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoader.cs:line 689
at Microsoft.ML.Data.TextLoader..ctor(IHostEnvironment env, Options options, IMultiStreamSource dataSample) in E:\A\_work\656\s\src\Microsoft.ML.Data\DataLoadSave\Text\TextLoader.cs:line 1187
at Microsoft.ML.Transforms.ImageClassificationTransformer.GetShuffledData(String path) in E:\A\_work\656\s\src\Microsoft.ML.Dnn\ImageClassificationTransform.cs:line 300
at Microsoft.ML.Transforms.ImageClassificationTransformer.TrainAndEvaluateClassificationLayer(String trainBottleneckFilePath, Options options, String validationSetBottleneckFilePath) in E:\A\_work\656\s\src\Microsoft.ML.Dnn\ImageClassificationTransform.cs:line 329
at Microsoft.ML.Transforms.ImageClassificationTransformer..ctor(IHostEnvironment env, Options options, DnnModel tensorFlowModel, IDataView input) in E:\A\_work\656\s\src\Microsoft.ML.Dnn\ImageClassificationTransform.cs:line 180
at Microsoft.ML.Transforms.ImageClassificationEstimator.Fit(IDataView input) in E:\A\_work\656\s\src\Microsoft.ML.Dnn\ImageClassificationTransform.cs:line 1520
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) in E:\A\_work\656\s\src\Microsoft.ML.Data\DataLoadSave\EstimatorChain.cs:line 67
at ImageClassification.Train.Program.Main() in C:\Users\Tjä\Documents\Xtoolkit\ML\X_MLServer\XInceptionTrainer\Program.cs:line 474
Adding @codemzs - I'm not able to reproduce it. Also, we're updating to ML.NET 1.4 to be released in a couple of days. There are some changes in this API (Image Classification). You might want to try the new version soon.
@thoj @bartosz-wozniak-lx What OS are you guys running this sample? and if you can confirm you have not altered the samples at all? or if you have then please provide the exact repro steps.
@thoj @bartosz-wozniak-lx In addition, could you add the complete output log, from start to the stack trace? It will be helpful to find out where the failure occurs.
One way of reproducing the error that I found is if interrupt the run when bottleneck values are being calculated, and in the subsequent run, try to reuse bottleneck values. The cached file thus created does not contain the complete dataset. In this case, the error can be resolved by either not reusing the bottleneck values, or deleting the cached files. There could be a different cause for this error as well. @thoj and @bartosz-wozniak-lx , can you confirm if your repro steps are different from this?
@codemzs I am running this sample on Windows 10. I have tried a few different computers. It works perfectly on OSX but does not work on Windows 10. Yes, I can confirm, I have not altered the samples at all. I have just downloaded it and run it.
@bartosz-wozniak-lx thanks! on Windows 10 were you running on dotnet core or framework?
I have tried both (Core and framework). None of them worked.
@bartosz-wozniak-lx in your bin folder there should be two “.csv” files that are created after a run of image classification API. Can you please send them here?
I'm getting the following exception: System.ArgumentOutOfRangeException: The size of input lines is not consistent while I'm trying tu run sample ImageClassification.Train.
This occurs in line "ITransformer trainedModel = pipeline.Fit(trainDataView);"
Is there any fix or workaround, please?