diasks2 / pragmatic_tokenizer

A multilingual tokenizer to split a string into tokens
MIT License
90 stars 11 forks source link

feature overlap with pragmatic_segmenter? #9

Open maia opened 8 years ago

maia commented 8 years ago

Currently there is some overlap between pragmatic_tokenizer and pragmatic_segmenter, as both e.g. handle abbreviations. Should rules and constants (especially when language specific) that are shared between both gems be extracted into a sub-gem? Or is there too little shared code to justify this?

And/or: should constant arrays and hashes be converted from ruby to .yml files? Maybe it's possible that the app will then only load them once, even if two gems use them?

diasks2 commented 8 years ago

I'd definitely be open to this if it reduced memory, improved the speed or made it easier to maintain the gems. This one is not high on my priority list right now but would be of course be open to pull requests.