Closed promethean closed 10 years ago
A recent commit added normalisation of the various scoring systems. They all have maxima and minima and the normalisation ensures that scores are kept within those ranges. The version of the code on Readability-Score.com has this normalisation disabled.
@DaveChild Just thought I'd check the unit tests to be sure, but running them with both the code from before the changes I made, as well with the current code base, fails a lot of the tests. So either the tests need adjusting or something else is going on....
I've just run the unit tests on four different points in time and have saved the results for your perusal in https://github.com/jrfnl/Text-Statistics/tree/unit-test-results
Results
Tests run on the code base of: | # of tests | Passed | Failed |
---|---|---|---|
2010-12-02 | 36 | 36 | 0 |
2011-12-12 | 37 | 30 | 7 |
2014-01-14 | 37 | 22 | 15 |
2014-02-11 | 37 | 19 | 18 |
By the looks of it some unit tests might need small adjustments in the assertions, but some also indicate that things are going wrong in the code... Let me know if I can be of assistance in fixing this.
For completeness, here's a summary of the actual test results:
File: TextStatisticsKiplingIf
Test | Expects | 2010-12-02 | 2011-12-12 | 2014-01-14 | 2014-02-11 | Stopped at first failing test: |
---|---|---|---|---|---|---|
KiplingSyllables | 1 | Passed | 2 | 2 | 2 | Line 312: 'Except' |
WordCount | 292 | Passed | Passed | Passed | Passed | |
SentenceCount | 1 | Passed | Passed | Passed | Passed | |
TextLengthCheck | 1125 | Passed | Passed | Passed | Passed | |
FleschKincaidReadingEase | -187.5 | Passed | -187.2 | 0 | 8.1 | |
FleschKincaidGradeLevel | 111.9 | Passed | Passed | 12 | 12 | |
GunningFogScore | 117.5 | Passed | Passed | 19 | 19 | |
ColemanLiauIndex | 6.9 | Passed | Passed | Passed | 12 | |
SMOGIndex | 14.1 | Passed | Passed | 12 | 12 | |
AutomatedReadabilityIndex | 142.7 | Passed | Passed | 12 | 12 |
File: TextStatisticsMelvilleMobyDick
Test | Expects | 2010-12-02 | 2011-12-12 | 2014-01-14 | 2014-02-11 | Stopped at first failing test: |
---|---|---|---|---|---|---|
KiplingSyllables | 2 | Passed | 1 | 1 | 1 | Line 68: 'Ishmael' |
WordCount | 201 | Passed | Passed | Passed | Passed | |
LongWordCount | 23/22 | Passed | Passed | Passed | Passed | |
SentenceCount | 8 | Passed | Passed | Passed | Passed | |
TextLengthCheck | 884 | Passed | Passed | Passed | Passed | |
FleschKincaidReadingEase | 53.4 | Passed | 53.8 | 53.8 | 100 | |
FleschKincaidGradeLevel | 12.1 | Passed | 12 | 12 | 12 | |
GunningFogScore | 14.4 | Passed | Passed | Passed | 19 | |
ColemanLiauIndex | 10.1 | Passed | Passed | Passed | 12 | |
SMOGIndex | 9.9 | Passed | Passed | Passed | Passed | |
AutomatedReadabilityIndex | 11.8 | Passed | Passed | Passed | Passed |
File: TextStatisticsTest
Test | Expects | 2010-12-02 | 2011-12-12 | 2014-01-14 | 2014-02-11 | Stopped at first failing test: |
---|---|---|---|---|---|---|
SyllableCountBasicWords | - | Passed | Passed | Passed | Passed | |
SyllableCountComplexWords | 3 | Passed | 1 | 1 | 1 | Line 132: 'CAPITALS' |
SyllableCountProgrammedExceptions | - | Passed | Passed | Passed | Passed | |
AverageSyllablesPerWord | - | Passed | Passed | Passed | Passed | |
WordCount | - | Passed | Passed | Passed | Passed | |
CheckPercentageWordsWithThreeSyllables | - | Passed | Passed | Passed | Passed | |
TextLengthCheck | - | Passed | Passed | Passed | Passed | |
SentenceCount | - | Passed | Passed | Passed | Passed | |
AverageWordsPerSentence | - | Passed | Passed | Passed | Passed | |
FleschKincaidReadingEase | 121.2 | Passed | Passed | 100 | 100 | Line 223: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.' |
FleschKincaidGradeLevel | -3.4 | Passed | Passed | 0 | 0 | Line 232: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.' |
GunningFogScore | 0.4 | Passed | Passed | Passed | 1 | Line 241: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.' |
ColemanLiauIndex | 13.6 / 3 | Passed | Passed | 12 | 12 | Line 256: 'Now it is time for a more complicated sentence, including several longer words.' / Line 251: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.' |
SMOGIndex | Passed | Passed | Passed | Passed | ||
AutomatedReadabilityIndex | -5.6 | Passed | Passed | 0 | 0 | Line 269: 'This. Is. A. Nice. Set. Of. Small. Words. Of. One. Part. Each.' |
File: TextStatisticsTestCMULex
Test | Expects | 2010-12-02 | 2011-12-12 | 2014-01-14 | 2014-02-11 | Stopped at first failing test: |
---|---|---|---|---|---|---|
SyllableCountFailingCMUWords | 3 | N/A | 2 | 2 | 2 | Line 31 - "aaa" |
Right you are. Thanks, I'd assumed it was just down to the rounding. Some of the errors are rounding, but some are errors in calculation.
Ok, I think this is now fixed. The failure count will be above zero, as one of the test files is essentially a large list of current known errors. The other files all report correct results now for the unit tests.
I've also added a flag to disable score normalization, for people who don't want their scores normalized.
@DaveChild Excellent! Glad to hear my analysis helped.
Very much, thanks :).
I'm now working through that awful CMU list to see if I can add a few more rules to the syllable counter, or if it's going to require a lot of manual assignment of values.
You are a star! Would you like me to keep the test result branch online or shall I pull it down ?
Both the flesch_kincaid_reading_ease() and flesch_kincaid_grade_level() methods are maxing out. The first at 100 and the latter at 19.
Every text block we try has the same issue. And the stats don't tally with those found on readability-score.com
Just FYI - maybe a recent commit has caused a bug to creep in?