Closed sfiruch closed 2 years ago
They do have support for word embeddings with the ApplyWordEmbedding
method. It can use a predefined model or you can use a custom model with it.
Here's the pipeline from the sample:
var textPipeline = mlContext.Transforms.Text.NormalizeText("Text")
.Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens", "Text"))
.Append(mlContext.Transforms.Text.ApplyWordEmbedding("Features",
"Tokens", WordEmbeddingEstimator.PretrainedModelKind.SentimentSpecificWordEmbedding));
Hope that helps!
Updating the title. As @jwood803 mentioned above, we already support pretrained word embeddings. You can also consume a contextualized word embedding model for prediction with Onnx. However, we currently don't support training for these models.
@Lynx1820 I think the title change to just "training" does not capture my feature request. The pretrained models are all context-free word embeddings. Such embeddings are not/no longer state-of-the-art.
Thanks for the clarification. I wrote training because inferencing is already possible by importing an Onnx model, but I understand your ask for a better API.
Hi @Lynx1820! Has there been any progress on this issue? Any updates?
On another note, I am experimenting with BERT on ML.NET. Where can I find example codes of end-to-end ML with BERT as part of preprocessing for classification tasks?
Unfortunately, there hasn't been any progress on this request. You can take a look at the workaround repo, provided in the description, as we don't have an example using BERT. Note that the pipeline would like very similar to other inferencing onnx models. You would just have another onnx model.
@aaidarov Take look at https://github.com/GerjanVlot/BERT-ML.NET it might give you some ideas.
@GerjanVlot We are attempting to develop a .NET Onnx viewer/editor that will make integration of e.g. BERT ONNX to .NET for intuitive. We need your feedback!!!!
Hi all,
Thanks for the discussion and feedback. We've recently implemented the Text Classification API which might satisfy some of the requirements listed here. The underlying model is based on a BERT-variant.
For more information, see the Blog post and sample notebook.
Closing this issue for now.
Intro
ML.NET is not yet very well equipped for some natural-language processing (NLP) workloads. While there is already support for basic processing steps (tokenization, stop word removal, ...) and sentiment, other, higher-level workloads are not yet supported.
Feature request
Use case
In our specific use case, we develop document classifiers. We only have a limited set of labeled documents to train with. Our plan is to use a pretrained or trained document embeddings, and learn a simple classifier on top, using the labeled documents.
Workarounds
There is already a project that runs BERT as ONNX on top of ML.NET, see https://github.com/GerjanVlot/BERT-ML.NET. I'd like to see this become an official part of ML.NET, with a good API, properly maintained and updated.
Outlook
These models are building-blocks for other features, like entity recognition (#630). Ideally ML.NET would support many more NLP tasks, as listed in https://github.com/microsoft/nlp-recipes#content. Generally, we notice an uptake in NLP-related project inquiries.