Does this library make sense for assessing named entity annotations?

valentinoli commented 2 years ago

Hello

I am interested in this library for assessing named entity annotations on text for NLP applications.

I saw the example provided in the docs where you have an audio sample that has segments annotated by several annotators. Each segment consists of a span of time (start -> end) and a label.

My use case is similar. Each segment has a start and an end, as well as a label. However, the segments are spans of tokens and not time, so they are discrete and not continuous.

Would I still be able to use your lib as in this example with CombinedCategoricalDissimilarity

Are there any issues I should be aware of?

Does the lib have other methods to deal with such use cases?

Rachine commented 2 years ago

Yes, this is definitely a use case. Besides this is showcased in the introduction paper of the gamma agreement by Mathet et al.

You can use integers to refer to the positions between your tokens with the boundaries of your Entities.

Example:

The queen Elizabeth will be received in France at the Élysée Palace -> 0 The 1 queen 2 Elizabeth 3 will 4 be 5 received 6 in 7 France 8 at 9 the 10 Élysée 11 Palace 12

valentinoli commented 2 years ago

Cool, I will check that paper out. Thanks for the response!

bootphon / pygamma-agreement

Does this library make sense for assessing named entity annotations? #33