GlobalMaksimum / sadedegel

A General Purpose NLP library for Turkish
http://sadedegel.ai
MIT License
93 stars 15 forks source link

Make SBD work only on `Document.sents` access #296

Closed dafajon closed 2 years ago

dafajon commented 2 years ago

Currently as a any Doc instance initializes, the sentencer known as sentence boundary detector works to split every raw document into sentences and for sadedegel.Sentences object. This causes a slowdown in any process that does not require sentence splitting beforehand.

Proposed solution is to make sbd work only when .sents attribute is invoked by the user or any other user implemented/overriden(__iter__, __len__) methods.

askarbozcan commented 2 years ago

Issue can be closed. #299