dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
4.48k stars 2.68k forks source link

"Label" for One-Class Matrix Factorization #873

Open sergey-tihon opened 3 years ago

sergey-tihon commented 3 years ago

There is a sample in this project MatrixFactorization_ProductRecommendation for "One-Class Matrix Factorization"

In this sample traindata loaded from 2 column file and added one more Label column in the dataset https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/getting-started/MatrixFactorization_ProductRecommendation/ProductRecommender/Program.cs#L31-L39

var traindata = mlContext.Data.LoadFromTextFile(path:TrainingDataLocation,
                                                      columns: new[]
                                                                {
                                                                    new TextLoader.Column("Label", DataKind.Single, 0), // HERE
                                                                    new TextLoader.Column(name:nameof(ProductEntry.ProductID), dataKind:DataKind.UInt32, source: new [] { new TextLoader.Range(0) }, keyCount: new KeyCount(262111)), 
                                                                    new TextLoader.Column(name:nameof(ProductEntry.CoPurchaseProductID), dataKind:DataKind.UInt32, source: new [] { new TextLoader.Range(1) }, keyCount: new KeyCount(262111))
                                                                },
                                                      hasHeader: true,
                                                      separatorChar: '\t');

when column added it is filled with NaNs image

According to documentation for MatrixFactorizationTrainer Class

The coordinate descent method included is specifically for one-class matrix factorization where all observed ratings are positive signals (that is, all rating values are 1). Notice that the only way to invoke one-class matrix factorization is to assign one-class squared loss to loss function when calling MatrixFactorization(Options). See Page 6 and Page 28 here for a brief introduction to standard matrix factorization and one-class matrix factorization.

Page 28 of linked paper also state that

image

'One-Class Matrix Factorization' method is used when we know only positive ratings/samples (1s)

Why MatrixFactorization_ProductRecommendation sample does not fill Label column with all 1s before matrix factorization?

// cc @CESARDELATORRE

Update: Here is more detailed explanation

lqdev commented 3 years ago

@luisquintanilla

lqdev commented 3 years ago

Thanks for opening this issue @sergey-tihon. I've tagged myself and will take a look.

sergey-tihon commented 3 years ago

Thank you @lqdev ! Appreciate if you suggest simpler way for add Label columns to dataset with all 1s

The only option that found is

// define new type 
type LabelColumn() =
   member val Label = 1.0f with get, set

// define new mapping function that return 1.0f
let labelMapping = Action<_,_>(fun (input:Product) (output:LabelColumn) -> output.Label <- 1.0f)

// add custom mapping to pipeline
.Append(context.Transforms.CustomMapping<Product, LabelColumn>(labelMapping, contractName = null))
sergey-tihon commented 2 years ago

@lqdev / @luisquintanilla did you have a chance to take a look? ;)