dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
4.46k stars 2.68k forks source link

Strange Issues with Image Classification Transfer Learning Sample #653

Open reneschulte opened 4 years ago

reneschulte commented 4 years ago

Seems the training is not running so well. The accuracy even after 500 epochs is way too low. I also tried the full 3600 images data set but not much better, so I have a feeling there’s something wrong here.

I just cloned the repo and got these very low accuracy results with VS 2019 with .NET Core 2.1 or 3.0.

Is anyone else seeing that with the sample or is it just me and maybe my machine config? https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_Training

See the super low accuracy here: image

I’ve also tried InceptionV3 instead of ResNetv2101 as architecture but that raises an ArgumentOutOfTRangeException: image

CESARDELATORRE commented 4 years ago

We're not able to reproduce it and several people have tried this sample and get the right accuracy, around 90%. It must be something related to your environment, but it is weird.. Let's not close the issue until the cause has been isolated, ok?

reneschulte commented 4 years ago

Agreed. It's some weird edge case and only a few might run into it. Still worth checking as it could be surfacing a deeper, hidden issue just under the conditions of my machine. I will try another machine tomorrow.

Any other ideas I should try?

CESARDELATORRE commented 4 years ago

Adding @codemzs he might have ideas on what to try to find out what's causing this issue.

iangithub commented 4 years ago

I tried it , but look very well image

reneschulte commented 4 years ago

I've tried on another machine b) and get good results there.

The machine a) that gets these bad results is Windows 10 Version 1903 (OS Build 1836.356). Also I have Python and TensorFlow-Base and TensorFlow-GPU 1.12 on that machine a) installed via Anaconda. Maybe there's some registry key it still sets that might mess it up using an old lib or so?

Again, I think it would still make sense to narrow down the root cause for machine a) as others might experience similar.

Fontijne commented 4 years ago

Hello All,

I'm facing exactly the same problems. Also after running the large data-set the accuracy is still very low. Most of the predictions are wrong but the score is always 1.

I also tried the other architecture (InceptionV3) but I had the same error showing up.

My system is Windows 10 Pro (Build 18362)

Any suggestions are more than welcom.

CESARDELATORRE commented 4 years ago

@codemzs - Can you advice on this issue? There's something in the infrastructure/OS that might be causing these strange issues.. We're not able to repro.

reneschulte commented 4 years ago

Hi, I just tried again with latest from master ddbe84ff23943fdc528178a17dc823f5d4802705 and still it's failing to train properly I think. This is the result when I plug in the created/trained imageClassifier.zip model into the Predict program:

image

FloP93 commented 4 years ago

I also have a similar problem. The acuraccy is very low. image

And also if i try to run it on GPU with SciSharp.TensorFlow.Redist-Windows-GPU (1.14.1 or 1.14.0) i get the followin error: image

My stats: Windows 10 Visual Studio Enterprise 2017 8GB Ram AMD Ryzen Processor GeForce GTX 1070 Graphicscard

Hope that might help

Edit: With InceptionV3 i get image

IveJ commented 4 years ago

Hi freind,

Which loss func you implement.

On Wed, Dec 11, 2019, 21:42 FlorianPfeil notifications@github.com wrote:

I also have a similar problem. The acuraccy is very low. [image: image] https://user-images.githubusercontent.com/34069654/70630645-2bf77d80-1c2c-11ea-9e60-ead8d4fc8d24.png

And also if i try to run it on GPU with SciSharp.TensorFlow.Redist-Windows-GPU (1.14.1 or 1.14.0) i get the followin error: [image: image] https://user-images.githubusercontent.com/34069654/70630781-6cef9200-1c2c-11ea-88f5-af40b3908faf.png

My stats: Windows 10 Visual Studio Enterprise 2017 8GB Ram AMD Ryzen Processor GeForce GTX 1070 Graphicscard

Hope that might help

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning-samples/issues/653?email_source=notifications&email_token=AEYAMLYTLVRGVF55UHX7OLLQYD35NA5CNFSM4IXLRPR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGTL2QA#issuecomment-564575552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYAMLZCZV62ITKQU5PYJKDQYD35NANCNFSM4IXLRPRQ .

FloP93 commented 4 years ago

Hi freind, Which loss func you implement.

I used the pretrained Networks MobilenetV2 and InceptionV3, which from my knowledge both use Relu in the hidden layers and Softmax as the last one. For the test i used the ImageClassification.Train sample : https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_Training/ImageClassification.Train

reneschulte commented 4 years ago

Same as @FloP93 I just used the pre-trained model with the sample. Nothing else.