dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
4.49k stars 2.69k forks source link

Image classification training error: Unable to find an entry point named 'TF_StringEncodedSize' in shared library 'tensorflow' #876

Open zeroskyx opened 3 years ago

zeroskyx commented 3 years ago

Greetings,

running the Train a deep learning image classification model with ML.NET and TensorFlow example as-it throws the following exception:

I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2299965000 Hz
Unhandled exception. System.EntryPointNotFoundException: Unable to find an entry point named 'TF_StringEncodedSize' in shared library 'tensorflow'.
   at Tensorflow.c_api.TF_StringEncodedSize(UInt64 len)
   at Microsoft.ML.Vision.ImageClassificationTrainer.EncodeByteAsString(VBuffer`1 buffer)
   at Microsoft.ML.Vision.ImageClassificationTrainer.ImageProcessor.ProcessImage(VBuffer`1& imageBuffer)
   at Microsoft.ML.Vision.ImageClassificationTrainer.CacheFeaturizedImagesToDisk(IDataView input, String labelColumnName, String imageColumnName, ImageProcessor imageProcessor, String inputTensorName, String outputTensorName, String cacheFilePath, Dataset dataset, Action`1 metricsCallback, Nullable`1 validationFraction)
   at Microsoft.ML.Vision.ImageClassificationTrainer.TrainModelCore(TrainContext trainContext)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at DeepLearning_ImageClassification.Program.Main(String[] args)

Is this an issue with the sample, or with the ML.net implementation?

13005463562 commented 3 years ago

I have the same problem

fuszenecker commented 1 year ago

https://github.com/dotnet/machinelearning-samples/issues/880

NakanoMiku13 commented 3 months ago

If it works for anyone, I've read to many posts, but only that work for me was downgrading TensorFlow.NET (0.150.0 -> 0.100.5) and TensorFlow.Keras (0.15.0 -> 0.10.5); CUDA 11.7 - cuDNN 8.9 QUADRO MOBILE T1000 8GB

<PackageReference Include="TensorFlow.NET" Version="0.100.5" /

But I found another problems (Could not locate zlibwapi.dll. Please make sure it is in your library path), like missing dll's (zlibwapi.dll), how I fix it was copying the file zlib.dll from C:\Program Files\NVIDIA Corporation\Nsight Systems 2022.1.3\host-windows-x64 and paste it on C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin

Another problem that I found after that was a compiling error (Tensorflow.TensorflowException: 'DNN library is not found.'), I search and forums only suggest that could be cuDNN version, so I downgrade again now from cuDNN 8.9 to cuDNN 8.4.1 (verify that is for CUDA 11.X) there I solve all the problems.

My current problem is OOM (so sad), but It works finally.

Hopefully it works for you.

Where I found the information and other links that could help: https://github.com/SciSharp/TensorFlow.NET/issues/1224 https://forum.image.sc/t/dnn-library-is-not-found-problem-with-tensorflow/81673 https://developer.nvidia.com/compute/cudnn/secure/8.4.1/local_installers/11.6/cudnn-windows-x86_64-8.4.1.50_cuda11.6-archive.zip (for windows) https://developer.nvidia.com/rdp/cudnn-archive https://stackoverflow.com/questions/72356588/could-not-locate-zlibwapi-dll-please-make-sure-it-is-in-your-library-path