dotnet / TorchSharp

A .NET library that provides access to the library that powers PyTorch.
MIT License
1.39k stars 181 forks source link

NLP NER Support #520

Open papyr opened 2 years ago

papyr commented 2 years ago

Hello is NER supported, and does it have entity categories listed like Spacy

NiklasGustafsson commented 2 years ago

@papyr - no, not yet.

It'd be a great project for someone to start hacking away at. :-)

GeorgeS2019 commented 2 years ago

The possible PyTorch project I could find is pytorch-ner or Pipeline for training NER models using PyTorch

Interesting, this is not part of TorchText

papyr commented 2 years ago

Would be nice if we could that support into the roadmap, some of these are core use cases for us to migrate over to torchsharp.

BTW, is there a plan for scaffolder to generate a plugin structure to write our own and contribute?

NiklasGustafsson commented 2 years ago

@tarekgh, @luisquintanilla -- any thoughts?

tarekgh commented 2 years ago

@luisquintanilla can give more info here about future support of NER.

luisquintanilla commented 2 years ago

NER is currently in the ML.NET roadmap and one of the next scenarios we're looking to tackle.

https://github.com/dotnet/machinelearning/blob/main/ROADMAP.md#named-entity-recognition-ner

We want to expose the scenario as a high-level API, similar to the current Text Classification API. Like the Text Classification API, it'll be powered by TorchSharp.

As we look to bring NER to ML.NET, one of the things we want to make sure of is the E2E training scenario is as smooth as possible. What I mean by that is, although we can provide an API for training NER models, you'll need some way to tag / label your data. The NER scenario is somewhat unique in how data is represented and to my understanding today there is no straightforward way to perform the data labeling task.

With that said, any thoughts or feedback on how to make this a good experience are appreciated.

papyr commented 2 years ago

A roadmap for these features should be easy to do. Many are going down the AWS path, since its not availed in AZURE/Native .NET

Also, a simple T4 scaffolder that is already natively built into VS, can help create models fairly fast is you enable the tooling with the useful context NER boilerplate. For e.g. during the scaffolding, choosing the algorithm, and the correct output format to import into a visualizer etc or a simple VS 2022 ASP MVC Core View

papyr commented 2 years ago

@luisquintanilla I saw the new roadmap and thanks for adding it there. The core thing about adoption is that ease of use, for e.g.

GeorgeS2019 commented 2 years ago

If PyTorch-NER provides TorchScript support, perhaps it is possible to use PyTorch-NER in TorchSharp?

GeorgeS2019 commented 1 year ago

@NiklasGustafsson

Currently NER is supported under ML.Net

papyr commented 7 months ago

The inference on keywords, for e.g. scanning documents and the context recognition does not match up with pytorch

The sample use case should be more general, when scanning documents maybe legal medical etc, its unstructured data, we need to contextualize to structured data here. This is still weak in this lib.