dotnet / TorchSharp

A .NET library that provides access to the library that powers PyTorch.
MIT License
1.36k stars 177 forks source link

Method not found: 'Void Module.train()'. #666

Closed drGarbinsky closed 2 years ago

drGarbinsky commented 2 years ago

I'm currently trying to get the same running from this page: https://devblogs.microsoft.com/dotnet/introducing-the-ml-dotnet-text-classification-api-preview/ working with: dotnet: 7.0.100-preview.6.22352.1 Microsoft.ML Version="2.0.0-preview.22313.1" Microsoft.ML.TorchSharp Version="0.20.0-preview.22313.1" TorchSharp-cpu Version="0.97.0"

WSL ubuntu: 20.04.3 LTS

code:

using Microsoft.ML; using Microsoft.ML.TorchSharp;

// Initialize MLContext var mlContext = new MLContext();

// Load your data var reviews = new[] { new {Text = "This is a bad steak", Sentiment = "Negative"}, new {Text = "I really like this restaurant", Sentiment = "Positive"} };

var reviewsDV = mlContext.Data.LoadFromEnumerable(reviews);

//Define your training pipeline var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Sentiment") .Append(mlContext.MulticlassClassification.Trainers.TextClassification(numberOfClasses: 2, sentence1ColumnName: "Text")) .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

// Train the model var model = pipeline.Fit(reviewsDV);

KernAlan commented 2 years ago

I'm experiencing the same issue right now.

JReed221 commented 2 years ago

I am experiencing this issue as well

drGarbinsky commented 2 years ago

I've noticed that this notebook references a newer version of the packages but I am unable to locate them

i "nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json"

r "nuget:Microsoft.ML,2.0.0-preview.22324.1"

r "nuget:Microsoft.ML.TorchSharp,0.20.0-preview.22324.1"

r "nuget:TorchSharp-cpu,0.96.7"

r "nuget:Microsoft.Data.Analysis,0.20.0-preview.22324.1"

I have also reproduced this issue within the nightly docker image so I don't think it is an environment issue

drGarbinsky commented 2 years ago

I was able to work around the issue by rolling back package versions:

JReed221 commented 2 years ago

that worked! thank you!

NiklasGustafsson commented 2 years ago

I believe that this is likely due to the fact that there was a breaking change made to Module.train() -- the previous version did not take an optional bool, which the Pytorch version does. That does not break source code dependencies, but breaks any binary dependency, since the signature has changed.

Adding @luisquintanilla and @michaelgsharp to consult on what to do about this.

NiklasGustafsson commented 2 years ago

A potential fix is to not make the bool optional, but to have two train() declarations -- one with, and one without. The latter would unbreak the binary dependency, I believe.

luisquintanilla commented 2 years ago

Hi all,

Sorry to hear you ran into issues.

I want to add clarity to the versions used in the notebook. The version in the notebook comes from the ML.NET daily feed. That version enables the use of the built-in Evaluate method in ML.NET and also makes it easier to work with the API. Previously you had to manually enter the number of classes you wanted to predict and couldn't use the evaluate method. This is the link to the daily feed if you want to get access to the latest versions of the ML.NET packages.

https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json

That's a good suggestion @NiklasGustafsson. I also wonder if another alternative would be to instruct users to reference a specific version of TorchSharp (as @drGarbinsky has done to work around the issue). We've done something similar with TensorFlow.NET in the past. What would be the disadvantages of doing that (other than your code won't run using the latest version of TorchSharp)?

NiklasGustafsson commented 2 years ago

What would be the disadvantages of doing that (other than your code won't run using the latest version of TorchSharp)?

That would be the disadvantage, in addition to the inconvenience of having to keep track of which version numbers work and not.

NiklasGustafsson commented 2 years ago

@tarekgh -- what I suggested above should take care of it, right? The old train() was virtual, the new will not be -- that should be okay, right?

luisquintanilla commented 2 years ago

What would be the disadvantages of doing that (other than your code won't run using the latest version of TorchSharp)?

That would be the disadvantage, in addition to the inconvenience of having to keep track of which version numbers work and not.

Thanks for clarifying @NiklasGustafsson. Let's explore your initial suggestion - "not make the bool optional, but to have two train() declarations -- one with, and one without. "

NiklasGustafsson commented 2 years ago

Hmmm.

The change I was thinking of was introduced in 0.97.0 last week, so if you have to go all the way back to 0.96.3 to fix it, that's very odd. @luisquintanilla -- is there any way you could try it out with 0.96.8?

tarekgh commented 2 years ago

@NiklasGustafsson [Changes that affect compatibility](https://docs.microsoft.com/en-us/dotnet/core/compatibility/?ranMID=43674&ranEAID=rl2xnKiLcHs&ranSiteID=rl2xnKiLcHs-Vk6trHPJPdwOyqxWVzHUQQ&epi=rl2xnKiLcHs-Vk6trHPJPdwOyqxWVzHUQQ&irgwc=1&OCID=AID2200057_aff_7795_1243925&tduid=(ir__2m3q0nl02wkf6gcatnnkkvci0e2xve3sx63xgf9200)(7795)(1243925)(rl2xnKiLcHs-Vk6trHPJPdwOyqxWVzHUQQ)()&irclickid=_2m3q0nl02wkf6gcatnnkkvci0e2xve3sx63xgf9200) doc mentioning the following:

❌ DISALLOWED: Removing the virtual keyword from a member

I guess the compiler will not bind to the method if the virtual removed from the member.

@luisquintanilla should we just publish the latest of ML.NET to the NuGet instead of having it in the internal feed and recommend the users to use the latest?

NiklasGustafsson commented 2 years ago

Another approach is to rebuild the ML.NET preview with 0.97.0 and pick up the new method definition.

NiklasGustafsson commented 2 years ago

We should probably track this as an issue with ML.NET, too.

tarekgh commented 2 years ago

We should probably track this as an issue with ML.NET, too.

Yes.

Another approach is to rebuild the ML.NET preview with 0.97.0 and pick up the new method definition.

Is this the same as what I suggested to publish the latest ML.NET to NuGet which will make it works with Torch Sharp 0.97.0? and we recommend users use the latest version of ML.NET.

NiklasGustafsson commented 2 years ago

Is this the same as what I suggested to publish the latest ML.NET to NuGet which will make it works with Torch Sharp 0.97.0? and we recommend users use the latest version of ML.NET.

Not sure. The issue should only affect binary dependencies, so if ML.NET is rebuilt against 0.97.0, how it is distributed is orthogonal, I believe.

tarekgh commented 2 years ago

What I meant is update the Torch Sharp version in the file https://github.com/dotnet/machinelearning/blob/4aad15b8f35b94554b2c5f7062d65a71e1b379b0/eng/Versions.props#L52 and then publish the new built ML.NET to NuGet.

tarekgh commented 2 years ago

I moved the issue to ML repo https://github.com/dotnet/machinelearning/issues/6263. Please continue to follow up and comment there.

@NiklasGustafsson I don't have permission to close the issue here. Could you please close it?

NiklasGustafsson commented 2 years ago

Closing, per @tarekgh's request.

drGarbinsky commented 2 years ago

Thanks for all the great input and work on this!

NiklasGustafsson commented 2 years ago

I was able to work around the issue by rolling back package versions:

<PackageReference Include="libtorch-cpu-linux-x64" Version="1.11.0.1" />
<PackageReference Include="Microsoft.Data.Analysis" Version="0.20.0-preview.22313.1" />
<PackageReference Include="Microsoft.ML" Version="2.0.0-preview.22313.1" />
<PackageReference Include="Microsoft.ML.TorchSharp" Version="0.20.0-preview.22313.1" />
<PackageReference Include="TorchSharp-cpu" Version="0.96.3" />

Does it work with 0.96.8?

drGarbinsky commented 2 years ago

https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json

@luisquintanilla I must be missing something because that feed looks like it hasn't been updated in a long time. Is that the correct URL?

tarekgh commented 2 years ago

@drGarbinsky yes https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json is the correct internal feed for machinelearning repo packages. what exactly the updates you are waiting for?

drGarbinsky commented 2 years ago

Ignore that. I found what I was looking for, thanks.