ejh243 / MunroeJargonProfiler

web app to analyse the jargon content of a text document
MIT License
1 stars 0 forks source link

Use different word lists for scoring #21

Open jonc125 opened 7 years ago

jonc125 commented 7 years ago

Neil suggested looking into the list of words known by a typical 12 year-old, and similar lists.

jonc125 commented 7 years ago

http://testyourvocab.com/details has some interesting information but haven't found an easy-to-use list yet.

jonc125 commented 7 years ago

It suggests http://www.kilgarriff.co.uk/bnc-readme.html, from the British National Corpus.

You'll probably want the lemma.num file, for the first 6,000 or so words. If you're looking for more advanced English, you'll probably want the all.num.o5 file, which contains more than 200,000 entries (although many of them are redundant).

However there's no copyright notice on it, so while there's an implication of public domain it's a grey area, and the author is now dead.

Or there's http://wordlist.aspell.net/12dicts-readme/ which is part public domain and part use-with-attribution.

jonc125 commented 7 years ago

Neil suggested:

ejh243 commented 7 years ago

The basic english used by wikipedia: http://ogden.basic-english.org/words.html