DaveChild / Text-Statistics

Generate information about text including syllable counts and Flesch-Kincaid, Gunning-Fog, Coleman-Liau, SMOG and Automated Readability scores.
https://readable.com/
BSD 2-Clause "Simplified" License
446 stars 107 forks source link

Numbers are not handled correctly #4

Open DaveChild opened 13 years ago

DaveChild commented 13 years ago

From Google Code:

Numbers within text numerically (1, 20, 100 etc) may not be handled correctly.

Currently an unknown - should "20" be counted as two syllables ("twen-ty") or as one syllable? Or should it be excluded from the calculations?

getconor commented 11 years ago

I don't know if you were still wondering about this, or even if you wanted to follow the original calculations as specified by J. Peter Kincaid and his team from 1975, but in the original paper[1] numbers are handled as follows:

For the Flesch-Kincaid Reading Ease score- numbers are counted as one word, and the number of syllables is indeed counted as the word is pronounced, "20" is "twen-ty" for 2 syllables, "1918" is "nineteen eighteen" for 4 syllables.[2]

For the Gunning Fog Index- numbers are considered "easy" words, and get a score of one.[3]

The paper covers a few more details for the calculations, like currency symbols, percent signs, etc.

[1] Kincaid, J.P., Fishburne, R.P., Rogers, R.L., & Chissom, B.S. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count, and Flesch Reading Ease formula) for Navy Enlisted Personnel. Research Branch Report 8-75. Chief of Naval Technical Training: Naval Air Station Memphis. [2] Kincaid 1975, p. 50. [3] Kincaid 1975, p. 48.

DaveChild commented 10 years ago

Thanks, this is great info. A little more than I have time to incorporate at the moment, but interesting for future development.