Support sequence tagging evaluation metrics (NLP)

Lightning-AI / torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.

https://lightning.ai/docs/torchmetrics/

Apache License 2.0

2.15k stars 408 forks source link

Support sequence tagging evaluation metrics (NLP) #1158

Open pietrolesci opened 2 years ago

pietrolesci commented 2 years ago

🚀 Feature

Support for sequence tagging evaluation metrics à la seqeval. That is, support the evaluation of the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.

SkafteNicki commented 2 years ago

cc @stancld opinion on this?

stancld commented 2 years ago

I'm not so familiar with this kind of metrics.. How much do these metrics differ from standard classification ones? :] @pietrolesci

pietrolesci commented 2 years ago

Hi @stancld,

I think it's not much different. The convenience of having sequence-level metrics already available is that

they can be fed sequences directly (without manual iteration)
can implement different evaluation "policies": "strict" vs non strict. For example
```
pred: [A, A, B]
true: [A, B, B]
```
can be considered partially correct or incorrect. This, of course, has an effect on how results are aggregated. An practical example in the README.md.
it can be easier to enforce particular encodings for the NER or POS tags (for example)
last but not least, it would be nice to have it in torchmetrics for consistency (i.e., no need to resort to other libraries/frameworks)

stancld commented 2 years ago

Hi @pietrolesci, I get the motivation and think this might be a nice contribution to torchmetrics. 👍

As these metrics will be very likely inherited from the classification ones, I'd just wait a bit with this addition for the finalization of the classification refactor currently ongoing #1001 :]

stancld commented 2 years ago

Hi @pietrolesci -- I think I should be able to find some time in the near future to have a look at this class of metrics. However, I'm not fully familiar with the current state of tagging metrics. Do you think it will make more sense if our public API will accept something like Sequence[Sequence[str], or it's better to use torch.Tensor here? (I think transformers models tend to output tensors, so it would make sense as well). Also, we can support both options and make sure everything is converted to tensors internally (considering this won't be too much confusing at our public api). What do you think? :] cc: @Borda @SkafteNicki

Borda commented 2 years ago

I think it would be good to explore this direction; also we can set a quick call with @pietrolesci to get more context, and maybe he could give us some intro... :rabbit: