Readability stats use wrong word count due to stop list usage

Great work—stumbled across this while writing my own Python script for readability stats. Looking forward to Topic Modeling :)

Between our work on readability scores, I noticed a discrepancy in word count, with far less words counted in your implementation.

Turns out that for calculating the readability stats in textacy.text_stats, you use the following line:

 words = doc.words(filter_punct=True)

which probably should be:

 words = doc.words(filter_punct=True, filter_stops=False)

By setting the default for filtering stop words to filter_stops=True in textacy.extract.words—which is a rather significant change to any text, so maybe the default should be False?—the number of words considered for the readability scores is reduced significantly and renders them incorrect.

chartbeat-labs / textacy

Readability stats use wrong word count due to stop list usage #7