dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.94k stars 1.86k forks source link

How TorchSharp & Onnx can address the pain points of the ~900 ML.NET Apr2021 survey responses? #5874

Open GeorgeS2019 opened 3 years ago

GeorgeS2019 commented 3 years ago

Is your feature request related to a problem? Please describe.

The pain points of the Apr 2021 ML.NET survey and the result discussions

image

Describe the solution you'd like

It is clear that NLP is high on priority This means more deep learning NLP use cases e.g. using ML.NET to load pretrained Hugging Face transformer models using OnnxRuntime

We can do that by porting some of the PyTorch NLP Transformer codes to c# to address the pain points of ML.NET!!

The porting process is now more feasible due to [the recent renaming effort of TorchSharp] (https://github.com/xamarin/TorchSharp/issues/308#issuecomment-877349861) which make TorchSharp codes MORE resemble PyTorch

michaelgsharp commented 3 years ago

@briacht can you take a look at this?

briacht commented 2 years ago

Yes! I had a conversation with @GeorgeS2019 about deep learning in .NET.

As part of our deep learning plan, we will enable NLP scenarios.

I believe these suggestions will start at the TorchSharp level.

GeorgeS2019 commented 2 years ago

@michaelgsharp @briacht Just identified an error in the c# example for the method 2 of OnnxCatalog.ApplyOnnxModel

The method 2 involves shapeDictionary, which is particularly useful for working with variable dimension inputs and outputs.

All examples provided by OnnxCatalog.ApplyOnnxMode use Image Classification through squeezenet onnx from Onnx zoo models

The users of the April survey requested NLP use case.

There are more image related examples involving ML.NET than NLP. Image use case, unlike NLP, which often does not involve variable dimension inputs and outputs.

I suggest the document provides, in addition to image, NLP examples from e.g. Onnx Zoo model e.g. GPT-2 => which will show HOW TO DEAL with VARIABLE dimension and the need to use ShapeDictionary.

shapeDictionary, which is particularly useful for working with variable dimension inputs and outputs.

This statement was introduced to the documentation through this PR to address the need for handling of variable axes of ONNX-models often found in NLP use case

We need expand the documentation to elaborate how to handle variable axes (e.g. using ShapeDictionary) especially in NLP use case

briacht commented 2 years ago

Thanks for the suggestion @GeorgeS2019!

@luisquintanilla can we add an issue to the docs repo for this?

GeorgeS2019 commented 2 years ago

@briacht

As correctly pointed out by @antoniovs1029

ML.NET doesn't currently have any transformer to do tensor reshaping, and it's necessary for users to actually implement their own reshape logic

This missing feature has been raised by @yaeldekel here Add "Reshape Transform"

=> I request to look into this "tensor reshaping" and the variable dimension discussed above. Perhaps both are related.

Here are NLP cases applying OnnxCatalog.ApplyOnnxModel onnx models

Perhaps by implementing the "Reshape Transform", this could address challenges when working with NLP with ML.NET?