dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.99k stars 1.88k forks source link

Using a target size ([32]) that is different to the input size ([32, 1]). #6522

Open ic202 opened 1 year ago

ic202 commented 1 year ago

System Information (please complete the following information):

Describe the bug

when trying to train SentenceSimilarity I get this warning::

Warning: Using a target size ([32]) that is different to the input size ([32, 1]). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. (function mse_loss)

dcostea commented 1 year ago

@ic202 I'm wondering, is this bug impacting the accuracy of the trained model?

ic202 commented 1 year ago

I get bad results anyway. That is the reason why I asked the question.

I tried to train the model to calculate how much I eat a paragraph of text corresponding to the question that was asked.

dcostea commented 1 year ago

@ic202 Me either, I get terrible metrics, so I have to postpone everything related to text classification until I get clarifications on that. Thank you.

dcostea commented 1 year ago

@ic202 If you use the most recent preview version (2.0.0-preview.22525.3) instead of the final version 2.0.0 the warnings are gone. Use this nuget repo to get all the preview versions: https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json

michaelgsharp commented 1 year ago

@dcostea the warnings go away and what about the metrics? Anything changed with the latest version?

dcostea commented 1 year ago

@michaelgsharp

Unfortunately the metrics are still bad.

With TorchSharp-cuda-windows 0.96.7

<PackageReference Include="Microsoft.ML" Version="2.0.0" />
<PackageReference Include="Microsoft.ML.TorchSharp" Version="0.20.0" />
<PackageReference Include="TorchSharp-cuda-windows" Version="0.96.7" />
<PackageReference Include="MathNet.Numerics.Signed" Version="5.0.0" />

Spearman Correlation: 0,03727262575393465

<PackageReference Include="Microsoft.ML" Version="2.0.1" />
<PackageReference Include="Microsoft.ML.TorchSharp" Version="0.20.1" />
<PackageReference Include="TorchSharp-cuda-windows" Version="0.96.7" />
<PackageReference Include="MathNet.Numerics.Signed" Version="5.0.0" />

Spearman Correlation: 0,04932447400525287

<PackageReference Include="Microsoft.ML" Version="3.0.0-preview.23106.1" />
<PackageReference Include="Microsoft.ML.TorchSharp" Version="0.21.0-preview.23106.1" />
<PackageReference Include="TorchSharp-cuda-windows" Version="0.96.7" />
<PackageReference Include="MathNet.Numerics.Signed" Version="5.0.0" />

Spearman Correlation: 0,046055858106189174

With TorchSharp-cuda-windows 0.98.3

<PackageReference Include="Microsoft.ML" Version="3.0.0-preview.23106.1" />
<PackageReference Include="Microsoft.ML.TorchSharp" Version="0.21.0-preview.23106.1" />
<PackageReference Include="TorchSharp-cuda-windows" Version="0.98.3" />
<PackageReference Include="MathNet.Numerics.Signed" Version="5.0.0" />

Spearman Correlation: Spearman Correlation: 0,009265691203680072

And some prints:

What do you want to say about? (Type Q to Quit) metal sheet Similarity to sheet metal: 0 0 2,2571285

What do you want to say about? (Type Q to Quit) popcorn Similarity to sheet metal: 0 0 2,2570286

cyberkoolman commented 1 year ago

Any updates on this? Running the official mlnet-sample's home-depot-sentence-similarity github repo prints out the same warnings. It prints out the low Pearson Correlation, 0.11 at the end of the run.

luisquintanilla commented 1 year ago

Hi all,

Sorry to hear this is an issue for you. Are you using your own datasets or the home depot dataset?

If using the home depot dataset, unfortunately the poor performance is to be expected mainly because of the dataset.

If there are other datasets out there you would recommend we include in the samples, please let us know and we'd be happy to update them.