dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

TensorFlow exception triggered while loading Inception model in a Jupyter C# notebook #4730

Closed mdfarragher closed 4 years ago

mdfarragher commented 4 years ago

System information

Issue

I'm running a C# kernel in an Anaconda3 Jupyter server as per these instructions: https://www.hanselman.com/blog/AnnouncingNETJupyterNotebooks.aspx. This works great and my notebooks can run most ML.NET demos without any trouble.

I am now buildingt a C# Jupyter notebook that demonstrates the object detection capabilities of the ML.NET library. I'm trying to run the DeepLearning_ImageClassification_TensorFlow demo (here: https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_TensorFlow).

Everything works fine up until this point in the code:

var temp = mlContext.Model.LoadTensorFlowModel("models/tensorflow_inception_graph.pb");

This method LoadTensorFlowModel throws an exception: System.FormatException: Tensorflow exception triggered while loading model.

I expected the method to load the tensorflow inception graph. This is what happens when running the same code on the Windows command line.

Source code / logs

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

The following is a minimal Jupyter C# notebook with only 2 code cells to reproduce the issue:

Code cell 1

#r nuget:Microsoft.ML
#r nuget:Microsoft.ML.ImageAnalytics
#r nuget:Microsoft.ML.TensorFlow

Code cell 2

using Microsoft.ML;
using Microsoft.ML.Data;
using System.IO;

// create a machine learning context
var mlContext = new MLContext();

// load the inception graph
var temp = mlContext.Model.LoadTensorFlowModel("models/tensorflow_inception_graph.pb");

The full exception is:

System.FormatException: Tensorflow exception triggered while loading model.
 ---> System.ArgumentNullException: Value cannot be null. (Parameter 'libraryPath')
   at System.Runtime.InteropServices.NativeLibrary.Load(String libraryPath)
   at MLS.Agent.NativeAssemblyLoadHelper.Resolve(String libraryName, Assembly assembly, Nullable`1 searchPath) in F:\workspace\_work\1\s\MLS.Agent\NativeAssemblyLoadHelper.cs:line 47
   at System.Runtime.InteropServices.NativeLibrary.LoadLibraryCallbackStub(String libraryName, Assembly assembly, Boolean hasDllImportSearchPathFlags, UInt32 dllImportSearchPathFlags)
   at Tensorflow.c_api.TF_NewGraph()
   at Tensorflow.Graph..ctor()
   at Microsoft.ML.TensorFlow.TensorFlowUtils.LoadTFSessionByModelFilePath(IExceptionContext ectx, String modelFile, Boolean metaGraph)
   --- End of inner exception stack trace ---
   at Microsoft.ML.TensorFlow.TensorFlowUtils.LoadTFSessionByModelFilePath(IExceptionContext ectx, String modelFile, Boolean metaGraph)
   at Microsoft.ML.TensorFlow.TensorFlowUtils.GetSession(IHostEnvironment env, String modelPath, Boolean metaGraph)
   at Microsoft.ML.TensorFlow.TensorFlowUtils.LoadTensorFlowModel(IHostEnvironment env, String modelPath)
   at Microsoft.ML.TensorflowCatalog.LoadTensorFlowModel(ModelOperationsCatalog catalog, String modelLocation)
   at Submission#11.<<Initialize>>d__0.MoveNext()

And the inner exception is:

System.ArgumentNullException: Value cannot be null. (Parameter 'libraryPath')
   at System.Runtime.InteropServices.NativeLibrary.Load(String libraryPath)
   at MLS.Agent.NativeAssemblyLoadHelper.Resolve(String libraryName, Assembly assembly, Nullable`1 searchPath) in F:\workspace\_work\1\s\MLS.Agent\NativeAssemblyLoadHelper.cs:line 47
   at System.Runtime.InteropServices.NativeLibrary.LoadLibraryCallbackStub(String libraryName, Assembly assembly, Boolean hasDllImportSearchPathFlags, UInt32 dllImportSearchPathFlags)
   at Tensorflow.c_api.TF_NewGraph()
   at Tensorflow.Graph..ctor()
   at Microsoft.ML.TensorFlow.TensorFlowUtils.LoadTFSessionByModelFilePath(IExceptionContext ectx, String modelFile, Boolean metaGraph)

I installed the Tensorflow package in Anaconda3, but my hunch is that the wrapper code in ML.NET is somehow not able to find the native library, possibly because dotnet-try is hosting the C# code.

I realize this is an unconventional configuration, but I'm trying to get as much of ML.NET as possible working in Jupyter notebooks so that C# becomes a viable language for ML training.

mdfarragher commented 4 years ago

Any news on progress? I am presenting an ML.NET training course in Budapest on the 11th of February. This bug is preventing me from demonstrating how to load TensorFlow models to my students.

antoniovs1029 commented 4 years ago

Hi, sorry for the delay. Thanks for reporting the issue. I will take a look into this.

eerhardt commented 4 years ago

@mdfarragher - the issue appears to be that you didn't reference the SciSharp.TensorFlow.Redist nuget package. This package is necessary to bring in the appropriate tensorflow native binaries (either CPU based or GPU based).

The following notebook works for me on the latest dotnet-interactive in Anaconda:

#r nuget:SciSharp.TensorFlow.Redist
#r nuget:Microsoft.ML
#r nuget:Microsoft.ML.ImageAnalytics
#r nuget:Microsoft.ML.TensorFlow
using Microsoft.ML;
using Microsoft.ML.Data;
using System.IO;

// create a machine learning context
var mlContext = new MLContext();

// load the inception graph
var temp = mlContext.Model.LoadTensorFlowModel(@"C:\Users\eerhardt\Downloads\tensorflow_inception_graph.pb");

Note that the machinelearning-sample's .csproj also has a PackageReference to this package:

https://github.com/dotnet/machinelearning-samples/blob/c4c2632135757110658ad2b502f6d18e8cd7f221/samples/csharp/getting-started/DeepLearning_ImageClassification_TensorFlow/ImageClassification/ImageClassification.Score.csproj#L17

antoniovs1029 commented 4 years ago

Hi, @mdfarragher . I have also been able to load the model without an exception, by adding the #r nuget:SciSharp.TensorFlow.Redist that Eric suggested.

So I will close this issue. But if you're still having problems with this, please feel free to reopen the issue. Thanks.

mdfarragher commented 4 years ago

I can confirm that LoadTensorFlowModel now works, thanks guys!

Unfortunately the ML.NET object detection sample still crashes a little further down in the code where I set up a prediction engine to estimate the contents of each image in the test set:

var engine = mlContext.Model.CreatePredictionEngine<ImageNetData, ImageNetPrediction>(model);
var images = ImageNetData.ReadFromCsv("images/tags.tsv");
foreach (var image in images)
{
    var prediction = engine.Predict(image).PredictedLabels;
    // .....
}

The call to Predict() throws the following exception:

System.TypeInitializationException: The type initializer for 'Gdip' threw an exception.
 ---> System.TypeInitializationException: The type initializer for 'System.Drawing.LibraryResolver' threw an exception.
 ---> System.InvalidOperationException: A resolver is already set for the assembly.
   at System.Runtime.InteropServices.NativeLibrary.SetDllImportResolver(Assembly assembly, DllImportResolver resolver)
   at System.Drawing.LibraryResolver..cctor()
   --- End of inner exception stack trace ---
   at System.Drawing.SafeNativeMethods.Gdip..cctor()
   --- End of inner exception stack trace ---
   at System.Drawing.SafeNativeMethods.Gdip.GdipCreateBitmapFromFile(String filename, IntPtr& bitmap)
   at System.Drawing.Bitmap..ctor(String filename, Boolean useIcm)
   at System.Drawing.Bitmap..ctor(String filename)
   at Microsoft.ML.Data.ImageLoadingTransformer.Mapper.<>c__DisplayClass4_0.<MakeGetterImageDataViewType>b__0(Bitmap& dst)
   at Microsoft.ML.Transforms.Image.ImageResizingTransformer.Mapper.<>c__DisplayClass3_0.<MakeGetter>b__1(Bitmap& dst)
   at Microsoft.ML.Transforms.Image.ImagePixelExtractingTransformer.Mapper.<>c__DisplayClass5_0`1.<GetGetterCore>b__1(VBuffer`1& dst)
   at Microsoft.ML.Transforms.TensorFlowTransformer.TensorValueGetterVec`1.GetTensor()
   at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.UpdateCacheIfNeeded(Int64 position, ITensorValueGetter[] srcTensorGetters, String[] activeOutputColNames, OutputCache outputCache)
   at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.<>c__DisplayClass9_0`1.<MakeGetter>b__4(VBuffer`1& dst)
   at Microsoft.ML.Data.TypedCursorable`1.TypedRowBase.<>c__DisplayClass8_0`1.<CreateDirectVBufferSetter>b__0(TRow row)
   at Microsoft.ML.Data.TypedCursorable`1.TypedRowBase.FillValues(TRow row)
   at Microsoft.ML.Data.TypedCursorable`1.RowImplementation.FillValues(TRow row)
   at Microsoft.ML.PredictionEngineBase`2.FillValues(TDst prediction)
   at Microsoft.ML.PredictionEngine`2.Predict(TSrc example, TDst& prediction)
   at Microsoft.ML.PredictionEngineBase`2.Predict(TSrc example)
   at Submission#7.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)
   at System.Drawing.SafeNativeMethods.Gdip.GdipCreateBitmapFromFile(String filename, IntPtr& bitmap)
   at System.Drawing.Bitmap..ctor(String filename, Boolean useIcm)
   at System.Drawing.Bitmap..ctor(String filename)
   at Microsoft.ML.Data.ImageLoadingTransformer.Mapper.<>c__DisplayClass4_0.<MakeGetterImageDataViewType>b__0(Bitmap& dst)
   at Microsoft.ML.Transforms.Image.ImageResizingTransformer.Mapper.<>c__DisplayClass3_0.<MakeGetter>b__1(Bitmap& dst)
   at Microsoft.ML.Transforms.Image.ImagePixelExtractingTransformer.Mapper.<>c__DisplayClass5_0`1.<GetGetterCore>b__1(VBuffer`1& dst)
   at Microsoft.ML.Transforms.TensorFlowTransformer.TensorValueGetterVec`1.GetTensor()
   at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.UpdateCacheIfNeeded(Int64 position, ITensorValueGetter[] srcTensorGetters, String[] activeOutputColNames, OutputCache outputCache)
   at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.<>c__DisplayClass9_0`1.<MakeGetter>b__4(VBuffer`1& dst)
   at Microsoft.ML.Data.TypedCursorable`1.TypedRowBase.<>c__DisplayClass8_0`1.<CreateDirectVBufferSetter>b__0(TRow row)
   at Microsoft.ML.Data.TypedCursorable`1.TypedRowBase.FillValues(TRow row)
   at Microsoft.ML.Data.TypedCursorable`1.RowImplementation.FillValues(TRow row)
   at Microsoft.ML.PredictionEngineBase`2.FillValues(TDst prediction)
   at Microsoft.ML.PredictionEngine`2.Predict(TSrc example, TDst& prediction)
   at Microsoft.ML.PredictionEngineBase`2.Predict(TSrc example)
   at Submission#7.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

This seems to be a problem with how ML.NET calls System.Drawing methods on Linux. I tried installing the libgdiplus library into my Jupyter container but that didn't help.

There's a similar looking issue reported here, but the fix didn't work for me: https://github.com/dotnet/runtime/issues/27200

My complete notebook is here. The final code block throws the exception: https://github.com/mdfarragher/GLC0220/blob/master/NeuralNetworks/ObjectDetection/ObjectDetection.ipynb

You can launch my code in MyBinder with this link: https://mybinder.org/v2/gh/mdfarragher/GLC0220/master?urlpath=%2Flab%2Ftree%2FNeuralNetworks%2FObjectDetection.ipynb

mdfarragher commented 4 years ago

I created a new issue for the System.Drawing exception as it's not entirely within the scope of this issue. You can find it here: https://github.com/dotnet/machinelearning/issues/4820