curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
743 stars 75 forks source link

SentenceDetector features extraction bug #73

Open gdsmiler opened 2 years ago

gdsmiler commented 2 years ago

Describe the bug https://github.com/curiosity-ai/catalyst/blob/cd575bfa2ce3e114a6ea03770ae440e439b283cd/Catalyst/src/Models/Base/SentenceDetector.cs#L418 In that line and next 3 used equality operator wich not implemented for IToken, so thats always false

https://github.com/curiosity-ai/catalyst/blob/cd575bfa2ce3e114a6ea03770ae440e439b283cd/Catalyst/src/Models/Base/SentenceDetector.cs#L423 Next token cannot be BOS and previous cannot be EOS when process document for sentences

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

theolivenbaum commented 2 years ago

Still relevant