Open freecraver opened 2 years ago
@freecraver I'm open to changing the tokenizer. Would you be interested in investigating the effort to switch over?
Sure - please check https://github.com/cdimascio/py-readability-metrics/pull/27 for my suggested changes.
Thanks for your work on this nice project.
I intend to create a library for text simplification, and potentially would like to integrate your package. The selection of a tokenizer has an impact on the obtained readability scores and I was wondering how you approached this issue.
Was there any specific reason for choosing the Tweet-Tokenizer over e.g. the default/recommended Nltk-Tokenizer which better depicts the Penn Treebank's definition of word-boundaries? https://github.com/cdimascio/py-readability-metrics/blob/3ffb97f6057ae2451599d083a69ece78a61a6fa4/readability/text/analyzer.py#L128