AnasAito / SkillNER

A (smart) rule based NLP module to extract job skills from text
https://skillner.vercel.app/
MIT License
131 stars 47 forks source link

Make text cleaning optional. #57

Open ruben-dedoncker opened 1 year ago

ruben-dedoncker commented 1 year ago

Is your feature request related to a problem? Please describe. The cleaning of the text makes it impossible to link annotated spans to the character indices of the original text. This in turn makes it impossible to compare the performance of this model to other ner models.

Describe the solution you'd like Make the text cleaning step optional. When the cleaning step is omitted, then abv_text == immutable_text.

Describe alternatives you've considered Provide additional metadata containing the start and end character indices of each annotated span linked to the original text rather in addition to the boundaries linked to the cleaned text

AnAnalogGuy commented 1 year ago

You could instantiate your own empty skillNer.cleaner.Cleaner to bypass text cleaning. However you also want to protect abv_text from later processing, which would require some changes in the code.

grafik