dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Add support for multi-dimensional arrays for model input/output. #6066

Open jannickj opened 2 years ago

jannickj commented 2 years ago

I have a fully working tensorflow model and I litterally just need the last step of having C# run my model, but I am stuck on a null exception.

I have a very simple setup, and I've locked down both sequence length and batch size, however no matter what i do it gives me the exception:

  at Microsoft.ML.Data.TypedCursorable`1.TypedRowBase.<>c__DisplayClass8_0`1.<CreateDirectVBufferSetter>b__0(TRow row)
   at Microsoft.ML.Data.TypedCursorable`1.TypedRowBase.FillValues(TRow row)
   at Microsoft.ML.Data.TypedCursorable`1.RowImplementation.FillValues(TRow row)
   at Microsoft.ML.PredictionEngineBase`2.FillValues(TDst prediction)
   at Microsoft.ML.PredictionEngine`2.Predict(TSrc example, TDst& prediction)
   at MyProject.Model.Run() in 

I have tested that the model works in python and I've made 100% sure the dimensions fit exactly.

public record Features
    {

        [ColumnName("x_1")]
        [VectorType(1, 41, 3)]
        public int[,,] UnigramWindows { get; set; } = null!;
        [ColumnName("x_2")]
        [VectorType(1, 41, 3)]
        public int[,,] BigramWindows { get; set; } = null!;
        [ColumnName("x_3")]
        [VectorType(1, 41, 3)]
        public int[,,] CharTypeWindows { get; set; } = null!;
        [ColumnName("x_4")]
        [VectorType(1, 41, 41)]
        public int[,,] WordsStartingAt { get; set; } = null!;
        [ColumnName("x_5")]
        [VectorType(1, 41, 41)]
        public int[,,] WordsEndingAt { get; set; } = null!;
        [ColumnName("x")]
        [VectorType(1)]
        public int[] SeqLen { get; set; } = null!;
    }

private record Output
{
    [VectorType(1, 41, 6)]
    public float[,,] Identity;
}

private static ITransformer LoadModel(
    MLContext mlContext,
    string modelPath)
{
    var tfModel = mlContext.Model
        .LoadTensorFlowModel(modelPath);
    var schema = tfModel.GetModelSchema();
    var revSchema = schema.Reverse().ToArray();
    var pipeline =
        tfModel
        .ScoreTensorFlowModel(
                outputColumnNames: new[] { "Identity" },
                inputColumnNames:
                new[] {
                    "x",
                    "x_1",
                    "x_2",
                    "x_3",
                    "x_4",
                    "x_5",
                },
                addBatchDimensionInput: false);

    var dataView = mlContext.Data.LoadFromEnumerable(Enumerable.Empty<Features>());
    ITransformer mlModel = pipeline.Fit(dataView);

    return mlModel;
}

public static run() 
{
        var model = LoadModel(mlContext, "model.pb");
    var predictionEngine = mlContext
        .Model
        .CreatePredictionEngine<Features, Output>(model);

        var res = predictionEngine.Predict(features);

    Console.WriteLine(System.Text.Json.JsonSerializer.Serialize(res));
}
jannickj commented 2 years ago

To any unfortunate soul who've had to deal with the same issue, I finally figured it out. 2d Arrays are not supported in dotnet ml, you're supposed to flatten the arrays yourself ><

michaelgsharp commented 2 years ago

@jannickj sorry for the confusion about this. This is something that we are discussing. As we are working on adding TorchSharp support to ML.NET, this will probably become a larger issue, so we are planning on revisiting this and discussing it again in the future.

jannickj commented 2 years ago

I think a simple fix for now would just be to throw an exception that says multidimensional arrays are not supported the problem is null exception makes figuring it out very obscure.

KonradZaremba commented 1 year ago

Ok so just to clarify. If I load tf model to ml.net that needs as input matrix [,] will I get proper output?

julianogimenez commented 9 months ago

Any news on that? I'm workint in a POC for a customer. I have a tensortflow model (N,160,6) but Im not able to input an array. How can I do that?

jannickj commented 9 months ago

@julianogimenez you just have to squeze your multidim array into a single dim array i.e from (N, 160, 6) -> (N 160 6,)