danielt998 / HanziToAnki

This is a program that takes a Chinese text as input and converts it to an Anki Deck
MIT License
21 stars 0 forks source link

Filter by frequency #4

Closed james-s-w-clark closed 2 years ago

james-s-w-clark commented 7 years ago

Filtering by frequency, and/or by HSK level, could improve usefulness of vocabulary extraction.

james-s-w-clark commented 7 years ago

Here is a list of character frequencies, in various file types (and for various corpuses).

http://lingua.mtsu.edu/chinese-computing/statistics/

james-s-w-clark commented 2 years ago

I think this is a bit redundant as HSK vocab files kinda does this for us. It's not perfect, but good enough so we can focus on other stuff to get more value faster