curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
742 stars 75 forks source link

Tokenization issue #97

Open fatjon95 opened 1 year ago

fatjon95 commented 1 year ago

Describe the bug While using EntityRecognition the tokenizing of a string, the returned value doesn't match with what is being sent

To Reproduce Tokenize value: "Postcode: 0000AA,huis nr. 223."

  1. is tokenized into two tokens and assigned wrong values.