dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Revise type mappings in MLNET-to-ONNX conversion #1198

Open justinormont opened 5 years ago

justinormont commented 5 years ago

Currently, for ONNX, we are mapping a U4 datatype (an unsigned 32-bit integer) to an Int64.

Should we be instead mapping the U4 datatype to Uint32 in ONNX? Or is there no support for a Uint32, and we're storing in an Int64?

https://github.com/dotnet/machinelearning/blob/96439752cbaad099093ca0e8c770576bf3a53ad4/src/Microsoft.ML.Onnx/OnnxUtils.cs#L329-L331

In the above code, you'll notice the mapping is currently:

The BL to Float & U4 to Int64 seem odd.

@wschin noted we have been mapping U4 to Int64 for the last two releases of WinML: https://github.com/dotnet/machinelearning/pull/947#discussion_r223443413

TomFinley commented 5 years ago

Indeed there are now 4 and 8 byte unsigned int types in ONNX, see:

https://github.com/dotnet/machinelearning/blob/96439752cbaad099093ca0e8c770576bf3a53ad4/src/Microsoft.ML.Onnx/OnnxMl.cs#L2969-L2970

We should use them, now that they exist.

justinormont commented 5 years ago

@TomFinley : Should we also map BL to Bool, or is there a reason behind Float?

Bool is available: https://github.com/dotnet/machinelearning/blob/96439752cbaad099093ca0e8c770576bf3a53ad4/src/Microsoft.ML.Onnx/OnnxMl.cs#L2962-L2964

Current BL mapping is to Float: https://github.com/dotnet/machinelearning/blob/96439752cbaad099093ca0e8c770576bf3a53ad4/src/Microsoft.ML.Onnx/OnnxUtils.cs#L308-L310

wschin commented 5 years ago

We also have map bool to float due to old op signatures. It'd be better to revise the entire framework against the latest ONNX spec.