Closed triandco closed 1 year ago
Related to this, I have created a repository documenting what I was trying to do. Unfortunately there is an error that stop me from running the model.
Unhandled exception. System.ArgumentOutOfRangeException: Could not determine an IDataView type and registered custom types for
member InputIds (Parameter 'rawType')
at Microsoft.ML.Data.InternalSchemaDefinition.GetVectorAndItemType(String name, Type rawType, IEnumerable`1 attributes, Boolean& isVector, Type& itemType)
at Microsoft.ML.Data.InternalSchemaDefinition.GetVectorAndItemType(MemberInfo memberInfo, Boolean& isVector, Type& itemType) at Microsoft.ML.Data.SchemaDefinition.Create(Type userType, Direction direction)
at Microsoft.ML.Data.InternalSchemaDefinition.Create(Type userType, Direction direction)
at Microsoft.ML.Data.DataViewConstructionUtils.CreateFromEnumerable[TRow](IHostEnvironment env, IEnumerable`1 data, SchemaDefinition schemaDefinition)
at Microsoft.ML.DataOperationsCatalog.LoadFromEnumerable[TRow](IEnumerable`1 data, SchemaDefinition schemaDefinition)
at Library.Test.get_prediction_pipeline(String file_path, MLContext mlContext) in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Lib.fs:line 33
at Library.Test.run(String file_path) in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Lib.fs:line 40
at <StartupCode$App>.$Program.main@() in D:\Developer\triandco\blau\prototypes\sbert-dotnet\src\App\Program.fs:line 5
I'm still unsure whether this is an issue caused by my lack of understanding or it is actually a bug.
Here's my current Input and Output model
type OnnxInput() =
[< ColumnName("input_ids") >]
member val InputIds: int64 seq seq = [[]] with get, set
[< ColumnName("attention_mask")>]
member val AttentionMask: int64 seq seq = [[]] with get, set
type OnnxOutput() =
[< ColumnName("last_hidden_state") >]
member val LastHiddenState: float32 seq seq = [[]] with get, set
I have also tried the OnnxSequenceType attribute just to receive the same error message.
type OnnxInput() =
[< ColumnName("input_ids"); OnnxSequenceType(typedefof<int64 seq>) >]
member val InputIds: int64 seq seq = [[]] with get, set
[< ColumnName("attention_mask"); OnnxSequenceType(typedefof<int64 seq>)>]
member val AttentionMask: int64 seq seq = [[]] with get, set
type OnnxOutput() =
[< ColumnName("last_hidden_state"); OnnxSequenceType(typedefof<float32 seq>) >]
member val LastHiddenState: float32 seq seq = [[]] with get, set
@luisquintanilla I know we have already discussed making this whole process more intuitive. Any quick pointers to help here though? You are much more familiar with F# than I am.
Hi @triandco
Thanks for your question. There's a few issues at hand here:
ML.NET expects Tensors (N-dimensional Arrays) to be represented as one-dimensional. For example, I would change the definition of InputIds
to:
member val InputIds: int64 seq = [] with get, set
ML.NET works with Single values, so you might want to perform some mapping
[< ColumnName("input_ids"); OnnxMapType(typedefof(Int64), typedefof(Single));
OnnxSequenceType(typedefof<Single>);>]
ML.NET only supports only 1 unknown dimension. For example, batch
and sequence
are both unknown dimensions for input_ids
. You know that because instead of having a number, they have a variable name. While you can set one of the dimensions as -1 to indicate unknown, you need to define the rest of the dimensions. While not the same, here is a sample that does that with the BiDAF ONNX model.
Hope this helps.
Another unsolicited tip, you can use Records with F#.
[<CLIMutable>]
type OnnxInput
{
[<ColumnName("input_ids")>] InputIds : int64 seq
//...
}
I am facing the same issue here. My data type is: type: int64[batch,sequence], and I still don't have any clue on how to make it work. Does anyone figure it out?
Thanks in advance for any help!
@yli223 I had some luck with the tips from @luisquintanilla πββοΈ, it is actually very helpful in term of understanding the type. Thank you @luisquintanilla. However, I gave up in the end because I kept getting block by some other issue.
In the end, I decided to use an InferenceSession to run the model. You can see my code here
It doesn't have all the type safety of the above implementation, but it works. π
@triandco Thank you!
Is your feature request related to a problem? Please describe. I was trying to follow a guide on how to use Onnx model with dotnet. I find it difficult to understand how to translate the data type of input and output into C#. The type as display on Netron seems to be python but not quite. From the example I understand that something like
int32[n,1]
would be a single int32 value, however, in one of my model, I found the typefloat32[batch, sequence,768]
which is harder to translate.Describe the solution you'd like Is there any further documentation on what these types are?
Describe alternatives you've considered I opened an issue on Netron's repo asking if there is any documentation on the type and he suggested that I post an issue on MS side.
I appreciate that there is already an opening issue about improving on this documentation. However, is there any quick pointer on this particular issue?
Context The current model was trying to run is an onnx export of msmarco-distilbert-base-tas-b from huggingface.
Much appreciated