Samyak2 / toipe

yet another typing test, but crab flavoured
MIT License
596 stars 31 forks source link

Larger wordlists #17

Open Samyak2 opened 2 years ago

Samyak2 commented 2 years ago

What and why?

Currently, the only built-in word list is the top 250 words list. This is very limiting as words will often repeat again in the same line and multiple times throughout a test.

It would be nice to have these word lists too:

How?

More info about the existing word list: https://docs.rs/toipe/latest/toipe/wordlists/constant.TOP_250.html

The word list needs to be added in this directory: https://github.com/Samyak2/toipe/tree/main/src/word_lists

and it needs to be listed here: https://github.com/Samyak2/toipe/blob/main/src/wordlists.rs

Samyak2 commented 2 years ago

The source I was using had only 5000 words (for free). Added 500-5000 words lists in 7c049c5acdf736958f4db155811edfaa9c9cdb8c, which is coming in v0.4.0.

benliepert commented 2 years ago

I think the word list size is misleading - textgen.rs only looks at words between 2 and 8 characters. There are 927 words in the 5000 wordlist, for example, that didn't meet this criteria (925/927 were larger than 8 chars). Maybe you could allow word size preference to be specified as a parameter (but default to between 2 and 8)?

Samyak2 commented 2 years ago

I think the word list size is misleading - textgen.rs only looks at words between 2 and 8 characters. There are 927 words in the 5000 wordlist, for example, that didn't meet this criteria (925/927 were larger than 8 chars). Maybe you could allow word size preference to be specified as a parameter (but default to between 2 and 8)?

Good catch! The 2 to 8 chars filter was quite arbitrary. --min-length and --max-length flags to specify this would be nice, although that will require a bit of work to make the RawWordSelector store the ToipeConfig too.