Do not remove dots from utterance in entity recognition

axa-group / nlp.js

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more

MIT License

6.27k stars 620 forks source link

Do not remove dots from utterance in entity recognition #1318

Open alberchou opened 1 year ago

alberchou commented 1 year ago

I have some entities with dots inside (for example: aaaaa.bbbbb.ccccc) and I need to set 1 as accuracy but if I do that those entities are not recognized,

Is there any option to exclude some characters from being used as token separator? Furthermore, is it possible to use that only for entity recognition (not for intent recognition).

Thank you!

alberchou commented 1 year ago

Other thing, that can be a bug (maybe): When I put a large list of values (that are previously added with nlp.addNamedEntityText function), the number of coincidences are less than the passed in the original string (in my case: 22 entities were found while sent a text chain of 29 values).