chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.22k stars 250 forks source link

Add Portuguese readability score #263

Closed hugoabonizio closed 5 years ago

hugoabonizio commented 5 years ago

This PR adds Portuguese support for readability score.

Description

Since spaCy supports Portuguese, it's trivial to add support on textacy. The Flesch-Kincaid has its weights modified following those references:

Motivation and Context

Portuguese is already supported by spaCy and lacking on textacy.

How Has This Been Tested?

To test it I added another entry to assert on test_flesch_reading_ease_langs test.

Types of changes

Checklist:

bdewilde commented 5 years ago

Hi @hugoabonizio , thanks for the PR. Code looks correct to me, although I'm surprised that the Portuguese equivalent is just a 42-point shift in value. 🤷‍♂ I'm going to try to merge this into the develop rather than master branch, since that's where I try to keep new features before cutting a new release.

Thanks again for the submission!

hugoabonizio commented 5 years ago

Hi @bdewilde, sorry for the confusion on branches. This shift seems that was developed in the following work and used since then.

MARTINS, T. B. et al. Readability formulas applied to textbooks in Brazilian Portuguese. [S.l.]: Icmsc-Usp, 1996.

Thanks for merging!