dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

System.FormatException: Tensorflow exception triggered while loading model. #4449

Closed benquan closed 4 years ago

benquan commented 4 years ago

System information

Issue

I upgraded from ML.NET 1.3.1 to 1.4.0. Everything worked fine in 1.3.1 but after upgrading I get the following error:

System.FormatException: Tensorflow exception triggered while loading model. ---> System.DllNotFoundException: Unable to load DLL 'tensorflow' or one of its dependencies: The specified module could not be found. (Exception from HRESULT: 0x8007007E)
   at Tensorflow.c_api.TF_NewGraph()
   at Tensorflow.Graph..ctor()
   at Microsoft.ML.TensorFlow.TensorFlowUtils.LoadTFSessionByModelFilePath(IExceptionContext ectx, String modelFile, Boolean metaGraph)
   --- End of inner exception stack trace ---
   at Microsoft.ML.TensorFlow.TensorFlowUtils.LoadTFSessionByModelFilePath(IExceptionContext ectx, String modelFile, Boolean metaGraph)
   at Microsoft.ML.TensorflowCatalog.LoadTensorFlowModel(ModelOperationsCatalog catalog, String modelLocation)
   at ImageClassification.ModelScorer.TFModelScorer.LoadModel(String dataLocation, String imagesFolder, String modelLocation) in \\Mac\Home\Downloads\QuickID-netcore-sample\ImageClassification\ModelScorer\TFModelScorer.cs:line 94
   at ImageClassification.ModelScorer.TFModelScorer.Score() in \\Mac\Home\Downloads\QuickID-netcore-sample\ImageClassification\ModelScorer\TFModelScorer.cs:line 80
   at ImageClassification.Program.Main(String[] args) in \\Mac\Home\Downloads\QuickID-netcore-sample\ImageClassification\Program.cs:line 22

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

eerhardt commented 4 years ago

In 1.3.1, Microsoft.ML.TensorFlow implicitly referenced the CPU version of TensorFlow. This caused issues when you wanted to use GPU packages instead.

So starting with 1.4.0, we removed the implicit reference to the CPU package, and now it is up to the application to pick which they'd like - CPU or GPU. So please add one of the following packages in your project:

@codemzs - did we put this breaking change in the release notes?

benquan commented 4 years ago

Thanks a lot. It started working on the self contained app. But now I am getting the same error when I try to convert that same app to a DLL and try access it as an external dependency from a different APP. Any idea?

drjahu commented 4 years ago

I also have this problem. I try to run the ImageClassification.Train example using:

najeeb-kazmi commented 4 years ago

For refenrece, we are talking about this sample.

If this sample project is converted to a class library, and then referenced as an external dependency in a different console app, the ML.NET nuget dependencies still need to be installed in the new console app. These are the same dependencies as those in the sample project.

I was able to exactly replicate the behavior of the above sample by doing the following: change the sample to a class library, create a new console app, reference the sample dll, and install the required nugets. This is what my project file for the new console app looked like:

  <ItemGroup>
    <PackageReference Include="Microsoft.ML" Version="1.4.0" />
    <PackageReference Include="Microsoft.ML.ImageAnalytics" Version="1.4.0" />
    <PackageReference Include="Microsoft.ML.TensorFlow" Version="1.4.0" />
    <PackageReference Include="SciSharp.TensorFlow.Redist" Version="1.15.0" />
  </ItemGroup>

  <ItemGroup>
    <Reference Include="ImageClassification.Score">
      <HintPath>..\..\..\..\..\..\najeeb-kazmi\machinelearning-samples\samples\csharp\getting-started\DeepLearning_ImageClassification_TensorFlow\ImageClassification\bin\Debug\netcoreapp2.1\ImageClassification.Score.dll</HintPath>
    </Reference>
  </ItemGroup>

Please feel free to reopen if this is still an issue.

Gaopeng-Bai commented 3 years ago

I got the same error when I deploy it on the azure app service. But it works on my local machine, I confused to find the solution. Generally, it should work on the azure server if it works on the local machine.

Is anyone who solves a similar issue?

Packages:

Oceania2018 commented 3 years ago

@Gaopeng-Bai Can you try downgrade the Tensorflow redist to v1.15.1 and v0.11 for TensorFlow .NET ?

Gaopeng-Bai commented 3 years ago

@Gaopeng-Bai Can you try downgrade the Tensorflow redist to v1.15.1 and v0.11 for TensorFlow .NET ?

It works on local machine as well. But Azure server catch the same error when I debugged on app service.

My issue might be the problem of the azure app service, but I have no idea why this happened on the azure server. I clicked the publish button of the visual studio to deploy it. It should be work as usual. do you have any idea why this happened?

Oceania2018 commented 3 years ago

@Gaopeng-Bai try v2.3.1.

Felipemfaria commented 3 years ago

@Gaopeng-Bai I have the same problem. It works on local machine but it doesn't when i publish on Azure. Did you solve this issue?

Gaopeng-Bai commented 3 years ago

@Gaopeng-Bai I have the same problem. It works on local machine but it doesn't when i publish on Azure. Did you solve this issue?

Yes, I changed the platform of the Azure app service from 32-bit to 64-bit, then the problem solved. Hope this will help you as well.

Felipemfaria commented 3 years ago

@Gaopeng-Bai I have the same problem. It works on local machine but it doesn't when i publish on Azure. Did you solve this issue?

Yes, I changed the platform of the Azure app service from 32-bit to 64-bit, then the problem solved. Hope this will help you as well.

I did this change too but still doesn´t work. Thanks for your help. I will keep trying to find a solution.