Closed henningko closed 8 years ago
@henningko Good catch! Thanks, will push the fix today (I'm also releasing the latest version, finally!). Sorry about the delay in responding — I don't get issue notifications emailed to me for some reason, so I only stumbled onto this now. Will keep a closer eye on this.
Great work—stumbled across this while writing my own Python script for readability stats. Looking forward to Topic Modeling :)
Between our work on readability scores, I noticed a discrepancy in word count, with far less words counted in your implementation.
Turns out that for calculating the readability stats in
textacy.text_stats
, you use the following line:which probably should be:
By setting the default for filtering stop words to
filter_stops=True
intextacy.extract.words
—which is a rather significant change to any text, so maybe the default should be False?—the number of words considered for the readability scores is reduced significantly and renders them incorrect.