flesch kincaid statistics are both in error

promethean commented 10 years ago

Both the flesch_kincaid_reading_ease() and flesch_kincaid_grade_level() methods are maxing out. The first at 100 and the latter at 19.

Every text block we try has the same issue. And the stats don't tally with those found on readability-score.com

Just FYI - maybe a recent commit has caused a bug to creep in?

DaveChild commented 10 years ago

A recent commit added normalisation of the various scoring systems. They all have maxima and minima and the normalisation ensures that scores are kept within those ranges. The version of the code on Readability-Score.com has this normalisation disabled.

jrfnl commented 10 years ago

@DaveChild Just thought I'd check the unit tests to be sure, but running them with both the code from before the changes I made, as well with the current code base, fails a lot of the tests. So either the tests need adjusting or something else is going on....

jrfnl commented 10 years ago

I've just run the unit tests on four different points in time and have saved the results for your perusal in https://github.com/jrfnl/Text-Statistics/tree/unit-test-results

Results

Tests run on the code base of:	# of tests	Passed	Failed
2010-12-02	36	36	0
2011-12-12	37	30	7
2014-01-14	37	22	15
2014-02-11	37	19	18

By the looks of it some unit tests might need small adjustments in the assertions, but some also indicate that things are going wrong in the code... Let me know if I can be of assistance in fixing this.

jrfnl commented 10 years ago

For completeness, here's a summary of the actual test results:

File: TextStatisticsKiplingIf

Test	Expects	2010-12-02	2011-12-12	2014-01-14	2014-02-11	Stopped at first failing test:
KiplingSyllables	1	Passed	2	2	2	Line 312: 'Except'
WordCount	292	Passed	Passed	Passed	Passed
SentenceCount	1	Passed	Passed	Passed	Passed
TextLengthCheck	1125	Passed	Passed	Passed	Passed
FleschKincaidReadingEase	-187.5	Passed	-187.2	0	8.1
FleschKincaidGradeLevel	111.9	Passed	Passed	12	12
GunningFogScore	117.5	Passed	Passed	19	19
ColemanLiauIndex	6.9	Passed	Passed	Passed	12
SMOGIndex	14.1	Passed	Passed	12	12
AutomatedReadabilityIndex	142.7	Passed	Passed	12	12

File: TextStatisticsMelvilleMobyDick

Test	Expects	2010-12-02	2011-12-12	2014-01-14	2014-02-11	Stopped at first failing test:
KiplingSyllables	2	Passed	1	1	1	Line 68: 'Ishmael'
WordCount	201	Passed	Passed	Passed	Passed
LongWordCount	23/22	Passed	Passed	Passed	Passed
SentenceCount	8	Passed	Passed	Passed	Passed
TextLengthCheck	884	Passed	Passed	Passed	Passed
FleschKincaidReadingEase	53.4	Passed	53.8	53.8	100
FleschKincaidGradeLevel	12.1	Passed	12	12	12
GunningFogScore	14.4	Passed	Passed	Passed	19
ColemanLiauIndex	10.1	Passed	Passed	Passed	12
SMOGIndex	9.9	Passed	Passed	Passed	Passed
AutomatedReadabilityIndex	11.8	Passed	Passed	Passed	Passed

File: TextStatisticsTest

Test	Expects	2010-12-02	2011-12-12	2014-01-14	2014-02-11	Stopped at first failing test:
SyllableCountBasicWords	-	Passed	Passed	Passed	Passed
SyllableCountComplexWords	3	Passed	1	1	1	Line 132: 'CAPITALS'
SyllableCountProgrammedExceptions	-	Passed	Passed	Passed	Passed
AverageSyllablesPerWord	-	Passed	Passed	Passed	Passed
WordCount	-	Passed	Passed	Passed	Passed
CheckPercentageWordsWithThreeSyllables	-	Passed	Passed	Passed	Passed
TextLengthCheck	-	Passed	Passed	Passed	Passed
SentenceCount	-	Passed	Passed	Passed	Passed
AverageWordsPerSentence	-	Passed	Passed	Passed	Passed
FleschKincaidReadingEase	121.2	Passed	Passed	100	100	Line 223: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.'
FleschKincaidGradeLevel	-3.4	Passed	Passed	0	0	Line 232: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.'
GunningFogScore	0.4	Passed	Passed	Passed	1	Line 241: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.'
ColemanLiauIndex	13.6 / 3	Passed	Passed	12	12	Line 256: 'Now it is time for a more complicated sentence, including several longer words.' / Line 251: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.'
SMOGIndex		Passed	Passed	Passed	Passed
AutomatedReadabilityIndex	-5.6	Passed	Passed	0	0	Line 269: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.'

File: TextStatisticsTestCMULex

Test	Expects	2010-12-02	2011-12-12	2014-01-14	2014-02-11	Stopped at first failing test:
SyllableCountFailingCMUWords	3	N/A	2	2	2	Line 31 - "aaa"

DaveChild commented 10 years ago

Right you are. Thanks, I'd assumed it was just down to the rounding. Some of the errors are rounding, but some are errors in calculation.

DaveChild commented 10 years ago

Ok, I think this is now fixed. The failure count will be above zero, as one of the test files is essentially a large list of current known errors. The other files all report correct results now for the unit tests.

I've also added a flag to disable score normalization, for people who don't want their scores normalized.

jrfnl commented 10 years ago

@DaveChild Excellent! Glad to hear my analysis helped.

DaveChild commented 10 years ago

Very much, thanks :).

I'm now working through that awful CMU list to see if I can add a few more rules to the syllable counter, or if it's going to require a lot of manual assignment of values.

jrfnl commented 10 years ago

You are a star! Would you like me to keep the test result branch online or shall I pull it down ?

DaveChild / Text-Statistics

flesch kincaid statistics are both in error #17