curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
715 stars 73 forks source link

Sentence detection broken ? #41

Closed BernhardGlueck closed 3 years ago

BernhardGlueck commented 4 years ago

Describe the bug Sentence detection only reacts to punctuation

To Reproduce Use sentence detector on non punctuated sentences. ( not seperated )

Expected behavior Sentence detector should detect when a sentence ends in the absence of punctuation

Also note when training Sentence detector models from scratch using UD ... the resulting models are extremely small.. 2kb for english or german ...

theolivenbaum commented 3 years ago

Hi @BernhardGlueck , do you have an example of a sentence that causes this issue?