dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
4.49k stars 2.69k forks source link

No sample for ProjectToPrincipalComponents() available: How to use it with SchemaDefinition? (System.ArgumentOutOfRangeException: 'Schema mismatch for input column') #974

Open lucas-albs opened 1 year ago

lucas-albs commented 1 year ago

I am trying to transform a dataview by calculating a PCA using the method ProjectToPrincipalComponents.

Each object is defined as following:

public class thisItem
    {
    public int itemName { get; set; }
    [ColumnName("Prices")]
    public double[] Prices { get; set; }
    }

Then I have a list:

List<thisItem> allItems = new();

I want to read this list in ML.NET:

var mlContext = new MLContext();
int numberOfFeatures = allItems.FirstOrDefault().Prices.Count();
SchemaDefinition schemaDef = SchemaDefinition.Create(typeof(thisItem)), SchemaDefinition.Direction.Both);
PrimitiveDataViewType itemType = ((VectorDataViewType)schemaDef["Prices"].ColumnType).ItemType;
schemaDef["Prices"].ColumnType = new VectorDataViewType(itemType, numberOfFeatures);

IDataView dataView = mlContext.Data.LoadFromEnumerable(allItems, schemaDef);

Microsoft.ML.Transforms.PrincipalComponentAnalyzer pipeline = mlContext.Transforms.ProjectToPrincipalComponents
    (outputColumnName: "Prices", inputColumnName: "Prices", rank: 10, seed: 1);

ITransformer datatransf = pipeline.Fit(dataView);

As soon as it runs the last line, I get the error: System.ArgumentOutOfRangeException: 'Schema mismatch for input column 'Prices': expected known-size vector of Single of two or more items, got Vector<Double, 17> (Parameter 'inputSchema')'

What could be wrong? I've been on this for hours, read all documentation and all github examples I found seem to be out of date.