dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Method not found: 'Void Module.train()'. #6263

Closed tarekgh closed 1 year ago

tarekgh commented 2 years ago

This issue is originally reported by @drGarbinsky in TorchSharp repo and ported here.

I'm currently trying to get the same running from this page: devblogs.microsoft.com/dotnet/introducing-the-ml-dotnet-text-classification-api-preview working with: dotnet: 7.0.100-preview.6.22352.1 Microsoft.ML Version="2.0.0-preview.22313.1" Microsoft.ML.TorchSharp Version="0.20.0-preview.22313.1" TorchSharp-cpu Version="0.97.0"

WSL ubuntu: 20.04.3 LTS

code:

using Microsoft.ML;
using Microsoft.ML.TorchSharp;

// Initialize MLContext
var mlContext = new MLContext();

// Load your data
var reviews = new[]
{
new {Text = "This is a bad steak", Sentiment = "Negative"},
new {Text = "I really like this restaurant", Sentiment = "Positive"}
};

var reviewsDV = mlContext.Data.LoadFromEnumerable(reviews);

//Define your training pipeline
var pipeline =
mlContext.Transforms.Conversion.MapValueToKey("Label", "Sentiment")
.Append(mlContext.MulticlassClassification.Trainers.TextClassification(numberOfClasses: 2, sentence1ColumnName: "Text"))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

// Train the model
var model = pipeline.Fit(reviewsDV);

@KernAlan wrote:

I'm experiencing the same issue right now.

@JReed221 wrote:

I am experiencing this issue as well

@drGarbinsky wrote:

I've noticed that this notebook references a newer version of the packages but I am unable to locate them

i "nuget:pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json"

r "nuget:Microsoft.ML,2.0.0-preview.22324.1"

r "nuget:Microsoft.ML.TorchSharp,0.20.0-preview.22324.1"

r "nuget:TorchSharp-cpu,0.96.7"

r "nuget:Microsoft.Data.Analysis,0.20.0-preview.22324.1"

I have also reproduced this issue within the nightly docker image so I don't think it is an environment issue

@drGarbinsky wrote

I was able to work around the issue by rolling back package versions:

<PackageReference Include="libtorch-cpu-linux-x64" Version="1.11.0.1" />
<PackageReference Include="Microsoft.Data.Analysis" Version="0.20.0-preview.22313.1" />
<PackageReference Include="Microsoft.ML" Version="2.0.0-preview.22313.1" />
<PackageReference Include="Microsoft.ML.TorchSharp" Version="0.20.0-preview.22313.1" />
<PackageReference Include="TorchSharp-cpu" Version="0.96.3" />

@JReed221 wrote

that worked! thank you!

@NiklasGustafsson wrote

I believe that this is likely due to the fact that there was a breaking change made to Module.train() -- the previous version did not take an optional bool, which the Pytorch version does. That does not break source code dependencies, but breaks any binary dependency, since the signature has changed.

Adding @luisquintanilla and @michaelgsharp to consult on what to do about this.

@NiklasGustafsson wrote

A potential fix is to not make the bool optional, but to have two train() declarations -- one with, and one without. The latter would unbreak the binary dependency, I believe.

@luisquintanilla wrote:

Hi all,

Sorry to hear you ran into issues.

I want to add clarity to the versions used in the notebook. The version in the notebook comes from the ML.NET daily feed. That version enables the use of the built-in Evaluate method in ML.NET and also makes it easier to work with the API. Previously you had to manually enter the number of classes you wanted to predict and couldn't use the evaluate method. This is the link to the daily feed if you want to get access to the latest versions of the ML.NET packages.

pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json

That's a good suggestion @NiklasGustafsson. I also wonder if another alternative would be to instruct users to reference a specific version of TorchSharp (as @drGarbinsky has done to work around the issue). We've done something similar with TensorFlow.NET in the past. What would be the disadvantages of doing that (other than your code won't run using the latest version of TorchSharp)?

@NiklasGustafsson wrote:

What would be the disadvantages of doing that (other than your code won't run using the latest version of TorchSharp)?

That would be the disadvantage, in addition to the inconvenience of having to keep track of which version numbers work and not.

@NiklasGustafsson wrote:

Hmmm.

The change I was thinking of was introduced in 0.97.0 last week, so if you have to go all the way back to 0.96.3 to fix it, that's very odd. @luisquintanilla -- is there any way you could try it out with 0.96.8?

@tarekgh wrote:

@NiklasGustafsson [Changes that affect compatibility](https://docs.microsoft.com/en-us/dotnet/core/compatibility/?ranMID=43674&ranEAID=rl2xnKiLcHs&ranSiteID=rl2xnKiLcHs-Vk6trHPJPdwOyqxWVzHUQQ&epi=rl2xnKiLcHs-Vk6trHPJPdwOyqxWVzHUQQ&irgwc=1&OCID=AID2200057_aff_7795_1243925&tduid=(ir__2m3q0nl02wkf6gcatnnkkvci0e2xve3sx63xgf9200)(7795)(1243925)(rl2xnKiLcHs-Vk6trHPJPdwOyqxWVzHUQQ)()&irclickid=_2m3q0nl02wkf6gcatnnkkvci0e2xve3sx63xgf9200) doc mentioning the following:

❌ DISALLOWED: Removing the virtual keyword from a member

I guess the compiler will not bind to the method if the virtual removed from the member.

@luisquintanilla should we just publish the latest of ML.NET to the NuGet instead of having it in the internal feed and recommend the users to use the latest?

@NiklasGustafsson wrote:

Another approach is to rebuild the ML.NET preview with 0.97.0 and pick up the new method definition.

@NiklasGustafsson wrote:

We should probably track this as an issue with ML.NET, too.

@tarekgh wrote:

We should probably track this as an issue with ML.NET, too.

Yes.

Another approach is to rebuild the ML.NET preview with 0.97.0 and pick up the new method definition.

Is this the same as what I suggested to publish the latest ML.NET to NuGet which will make it works with Torch Sharp 0.97.0? and we recommend users use the latest version of ML.NET.

@NiklasGustafsson wrote:

Is this the same as what I suggested to publish the latest ML.NET to NuGet which will make it works with Torch Sharp 0.97.0? and we recommend users use the latest version of ML.NET.

Not sure. The issue should only affect binary dependencies, so if ML.NET is rebuilt against 0.97.0, how it is distributed is orthogonal, I believe.

@tarekgh wrote:

What I meant is update the Torch Sharp version in the file dotnet/machinelearning@4aad15b/eng/Versions.props#L52 and then publish the new built ML.NET to NuGet.

NiklasGustafsson commented 2 years ago

@luisquintanilla -- one advantage of moving to 0.97.0, besides undoing the breakage, will be that ML.NET should be able to import TorchScript modules, and then use them for (re-)training as well as inference.

aforoughi1 commented 2 years ago

Do you have a timeline to fix the issue? Still broken under Microsoft.ML.AutoML" 0.20.0-preview.22470.1 Microsoft.ML.TorchSharp" 0.20.0-preview.22470.1 TorchSharp 0.96.7

tarekgh commented 2 years ago

@aforoughi1 we'll try to look at this soon, around next week or so. Thanks for the reminder.

NiklasGustafsson commented 2 years ago

I should merely be a question of rebuilding ML.NET against a newer version of TorchSharp

ericstj commented 1 year ago

Thank you @michaelgsharp for making this change. Can we close this issue now?

One thought here that we could consider in the future is flowing TorchSharp through darc, the way other dotnet libraries flow their bits between repos (auto-PRs, channels, etc). That would require some work on the TorchSharp side first since it looks like it doesn't use arcade for build and publishing.

tarekgh commented 1 year ago

Closing this issue as I believe it is already fixed.