dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.89k forks source link

System.InvalidOperationException: Variable length input columns not supported on multidimensional array #6387

Closed UtopiaDreamTeam closed 2 years ago

UtopiaDreamTeam commented 2 years ago

System Information (please complete the following information):

Describe the bug System.InvalidOperationException: Variable length input columns not supported on when using VectorTypeAttribute or shapeDictionary parameter on OnnxCatalog.ApplyOnnxModel

To Reproduce Steps to reproduce the behavior:

  1. Create new Console Project
  2. Add Microsoft.ML.OnnxTransformer
  3. [VectorType(0,1, 1, 16)] or
    var onnxPredictionPipeline = mlContext.Transforms.ApplyOnnxModel(outputColumns, inputColumns,
     "model.onnx", shapeDictionary: new Dictionary<string, int[]>
    {
        ["context_word"] = new int[] { 0, 1 },
        ["context_char"] = new int[] { 0, 1, 1, 16 },
        ["query_word"] = new int[] { 0, 1 },
    ["query_char"] = new int[] { 0, 1, 1, 16 },
    },
     null, false, 100);
  4. will cause the error when TrivialEstimator.Fit() is called

Screenshots, Code, Sample Projects I'm trying to use this Onnx model https://github.com/onnx/models/tree/main/text/machine_comprehension/bidirectional_attention_flow

context_word: [seq, 1,] of string
context_char: [seq, 1, 1, 16] of string
query_word: [seq, 1,] of string
query_char: [seq, 1, 1, 16] of string

Full Code

using Microsoft.ML.Transforms.Onnx;
using Microsoft.ML;

namespace MachineLearning
{
    internal class Program
    {
        static void Main(string[] args)
        {
            MLContext mlContext = new MLContext();
            var onnxPredictionPipeline = GetPredictionPipeline(mlContext);
            var onnxPredictionEngine = mlContext.Model.CreatePredictionEngine<OnnxInput, OnnxOutput>(onnxPredictionPipeline);
            var testInput = new OnnxInput("A quick brown fox jumps over the lazy dog.", "What color is the fox?");

            var testOutput = onnxPredictionEngine.Predict(testInput);
            Console.WriteLine($"start:{testOutput.StartPos} end:{testOutput.EndPos} ");
        }

        static ITransformer GetPredictionPipeline(MLContext mlContext)
        {
            var inputColumns = new string[]
            {
                "context_word", "query_word","context_char","query_char"
            };

            var outputColumns = new string[] { "start_pos", "end_pos" };
            var onnxPredictionPipeline = mlContext.Transforms.ApplyOnnxModel(outputColumns, inputColumns,
                    "model.onnx", shapeDictionary: new Dictionary<string, int[]>
                    {
                        ["context_word"] = new int[] { 0, 1 },
                        ["context_char"] = new int[] { 0, 1, 1, 16 },
                        ["query_word"] = new int[] { 0, 1 },
                        ["query_char"] = new int[] { 0, 1, 1, 16 },
                    }, null, false, 100);
            var Dv = mlContext.Data.LoadFromEnumerable(new OnnxInput[] { new OnnxInput("A quick brown fox jumpes over the lazy dog.", "What color is the fox?") });
            return onnxPredictionPipeline.Fit(Dv);
        }
    }
}

Output model

using Microsoft.ML.Data;

namespace MachineLearning
{
    public class OnnxOutput
    {
        [ColumnName("start_pos")]
        public int StartPos { get; set; }
        [ColumnName("end_pos")]
        public int EndPos { get; set; }
    }
}

Input model

using Microsoft.ML.Data;

namespace MachineLearning
{
    public class OnnxInput
    {
        public OnnxInput(string context, string query)
        {
        }

        [ColumnName("context_word")]
        public string[] ContextWord { get; set; }

        [ColumnName("context_char")]
        [VectorType(0,1, 1, 16)]
        public string[] ContextChar { get; set; }

        [ColumnName("query_word")]
        //[VectorType(-1,1)]
        public string[] QueryWord { get; set; }

        [ColumnName("query_char")]
        //[VectorType(-1,1,1,16)]
        public string[] QueryChar { get; set; }

    }
}
luisquintanilla commented 2 years ago

Hi @UtopiaDreamTeam

Here is a sample that is able to use the BiDAF ONNX model inside ML.NET. The issue is not the multidimensional arrays. It's the variable sized vectors. To fix that, set the dimension to a known size of your choosing and make sure that when you prepare your data, it matches that size.

https://gist.github.com/luisquintanilla/164176ec414e465246d6323aa62b38df