dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Error during training tensorflow images recognition with ML.NET 1.4preview2 #4369

Closed FilipRudzinski closed 4 years ago

FilipRudzinski commented 4 years ago

System information

Issue

Error running sample: ImageClassification.Train with Ml.NET 1.4Preview2

Source code / logs

Exception:

System.ArgumentOutOfRangeException: The size of input lines is not consistent Parameter name: Source at Microsoft.ML.Data.TextLoader.Bindings..ctor(TextLoader parent, Column[] cols, IMultiStreamSource headerFile, IMultiStreamSource dataSample) at Microsoft.ML.Data.TextLoader..ctor(IHostEnvironment env, Options options, IMultiStreamSource dataSample) at Microsoft.ML.Transforms.ImageClassificationTransformer.GetShuffledData(String path) at Microsoft.ML.Transforms.ImageClassificationTransformer.TrainAndEvaluateClassificationLayer(String trainBottleneckFilePath, Options options, String validationSetBottleneckFilePath) at Microsoft.ML.Transforms.ImageClassificationTransformer..ctor(IHostEnvironment env, Options options, DnnModel tensorFlowModel, IDataView input) at Microsoft.ML.Transforms.ImageClassificationEstimator.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) at ImageClassification.Train.Program.Main(String[] args) in C:\Projekty\Experimental\v14\DeepLearning_TensorFlowEstimator\ImageClassification.Train\Program.cs:line 78

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

ashbhandare commented 4 years ago

What dataset were you using for this run? Also, could you give a reproducer?

FilipRudzinski commented 4 years ago

There are simple steps: 1: Download sample: https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_Training/ImageClassification.Train 2: Run it, and you get error. But here the thing, sample does work on Mac - OSX (Macbook Air), but doesnt work on Windows 10 Computer - have tried on 3 different machines, got same error on all of them.

rjlexx commented 4 years ago

I have the same exception on my Win 10 x64 PC:

System.ArgumentOutOfRangeException: 'The size of input lines is not consistent Arg_ParamName_Name' The size of input lines is not consistent Parameter name: Source at Microsoft.ML.Data.TextLoader.Bindings..ctor(TextLoader parent, Column[] cols, IMultiStreamSource headerFile, IMultiStreamSource dataSample) at Microsoft.ML.Data.TextLoader..ctor(IHostEnvironment env, Options options, IMultiStreamSource dataSample) at Microsoft.ML.Transforms.ImageClassificationTransformer.GetShuffledData(String path) at Microsoft.ML.Transforms.ImageClassificationTransformer.TrainAndEvaluateClassificationLayer(String trainBottleneckFilePath, Options options, String validationSetBottleneckFilePath) at Microsoft.ML.Transforms.ImageClassificationTransformer..ctor(IHostEnvironment env, Options options, DnnModel tensorFlowModel, IDataView input) at Microsoft.ML.Transforms.ImageClassificationEstimator.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input) at ImageClassification.Train.Program.Main() in E:\ImageClassification_Training\ImageClassification.Train\Program.cs:line 79

If I change architecture to ResnetV2101 training process finishes successfully. But, the trained model gives absolutelly wrong predictions with strange Score=1 изображение

ashbhandare commented 4 years ago

@rjlexx @FilipRudzinski While creating your ImageClassification pipeline, could you try by explicitly setting the option: https://github.com/dotnet/machinelearning/blob/0b9308e11b2d8e385339aa866a0aaf60f4fc54b2/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/ImageClassification/LearningRateSchedulingCifarResnetTransferLearning.cs#L92 and https://github.com/dotnet/machinelearning/blob/0b9308e11b2d8e385339aa866a0aaf60f4fc54b2/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/ImageClassification/LearningRateSchedulingCifarResnetTransferLearning.cs#L91

rjlexx commented 4 years ago

@ashbhandare, unfortunatelly it doesn't help. It's looks strange that acurancy is too small(0.2) and training passes only 24 epochs exept of 100. изображение

изображение

изображение

frank-dong-ms-zz commented 4 years ago

@FilipRudzinski @rjlexx ml.net has released new version of 1.5.0, please try out to see if you can still repro the issue. I tried with 1.5.0 with provided samples and everything seems work find, thanks.

frank-dong-ms-zz commented 4 years ago

close this as no feedback from user, feel free to reopen if necessary.