GlobalMaksimum / sadedegel

A General Purpose NLP library for Turkish
http://sadedegel.ai
MIT License
93 stars 15 forks source link

Activate sentencer on sentences access [Resolves #296] #299

Closed dafajon closed 2 years ago

dafajon commented 2 years ago
askarbozcan commented 2 years ago

Expected performance has been achieved. Tokenization times (s) (on extended raw dataset, ICU tokenizer) Old: 790 New: 38

TF-IDF generation time (s): Old: 1074 New: 176

No issues have been found, except having two private variables _sents and _sentences (might be confusing for later on?) EDIT: It might be preferable to keep it this way, to keep backward compatibility.

husnusensoy commented 2 years ago

Excited !!! Will be testing and merging asap Hüsnü Şensoy / VLDB Expert @.***

[image: Global Maksimum Data & Information Tech]

Global Maksimum Data & Information Tech +902162506637 / +902162506600 Acıbadem Mah. Çeçen Sk. Akasya Kule A-3 No:25 Kat:14 34660 Üsküdar, İstanbul Türkiye

[image: LinkedIn] https://htmlsig.com/t/0000001BRGEK6 [image: Instagram] https://htmlsig.com/t/000001DZYDYM [image: Github] https://htmlsig.com/t/000001DDAY0N

On Tue, Oct 26, 2021 at 10:59 PM Askar Bozcan @.***> wrote:

Expected performance has been achieved. Tokenization times (s) (on extended raw dataset, ICU tokenizer) Old: 790 New: 38

TF-IDF generation time (s): Old: 1074 New: 176

No issues have been found, except having two private variables _sents and _sentences (might be confusing for later on?)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GlobalMaksimum/sadedegel/pull/299#issuecomment-952274425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVFRWJCDFE46KFH6ADPB3UI4JDTANCNFSM5GT6GCDA .