cdimascio / py-readability-metrics

📗 Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more
https://py-readability-metrics.readthedocs.io/en/latest/
MIT License
348 stars 58 forks source link

Add support for additional languages beyond English? #2

Open delzennejc opened 5 years ago

delzennejc commented 5 years ago

Hi,

Thank you very much, this is awesome ! I would like to know if this works also for romance language like French, Spanish, Portuguese or it supports only English ?

cdimascio commented 5 years ago

Thanks so much for your interest @Prattjames.

Currently, the package supports English. The package uses sent_tokenize which uses PunktSentenceTokenizer (punkt) under the covers by default. punkt appears to supports other languages.

After a quick review, it seems to enable other language, we would need to update the [sent_tokenize](https://github.com/cdimascio/py-readability- metrics/blob/master/readability/text/analyzer.py#L66) call to specify another punkd supported language e.g.

sent_tokenize(text, language='spanish'): # where spanish is any language supported by punkt

It seems that making this configurable though this package would enable us to support more languages.

Such a change would enable all of the current scorers except for dale_chall. In order to support dale_chall propertly we need its list for each language. We could just ignore dale_chall for now

If you have any thoughts or are interested in helping out, or even submitting a PR, I'd welcome it

cdimascio commented 4 years ago

Looking for help on this. We certainly don't need to support all languages. If support can be added for at least one additional language that will be a fantastic start!

wcc526 commented 4 years ago

Does it support Chinese?

cdimascio commented 4 years ago

It does not currently support Chinese. Im looking for help if folks are interested. PRs always welcome

OanaIgnat commented 5 months ago

Any updates on this? Does it support other languages?