Open Samyak2 opened 2 years ago
How about storing it as a trie or Radix Tree
How about storing it as a trie or Radix Tree
@lucasace that's a good way to compress the data. Although, I'm not sure if it will help in selecting random words in a uniform manner from the list. Do you have any ideas about implementing that in a trie or a radix tree?
@Samyak2 I think my suggestion is a little wrong, even though a trie or radix tree might be faster in computation of the random word(not entirely sure of this), If im not wrong, what you want to do is find a way to store the word lists by reducing the size of the file
Taking inspiration from the trie maybe you could store words in the following fashion where the number represents the number of letters taken from the 0th word and the letters represent the letters to be added A example taken from below would be <5,t> which amounts to applet by taking the first 5 letters from apple Note:- // is used a comment
0, apple
5, t // applet
3, ear //appear
0, ball
3, d //bald
...
...
Now one drawback in storing it this way would be the extra computation of finding the most recent '0'th word in the series, though it might be that heavy it would be nice to avoid it altogether. A variation of the above in storing it could be
0, 0, apple
5, 1, t
3, 2, ear
0, 0, ball
3, 1, d
Where the second integer represents number of steps to the 0th word
And as for random words in tries this might help
What?
As new word lists get added (#17, 7c049c5acdf736958f4db155811edfaa9c9cdb8c, #27), the size of the final release binary increases since these are embedded directly into it.
Binary size does not matter much, but it would be nice to have it as small as reasonably possible.
How?
This issue has been opened to discuss possible ways to reduce the binary size further.
Some of my ideas:
Compression
The word lists could be compressed using a lightweight compression algo (TBD) at compile time i.e., the word lists will stay the same as they are, but will be embedded into the binary after compressing them. At run time, they will need to be decompressed before using them.
The compression library needs to be lightweight because I wouldn't want to add more bloat from the library than what it saves in compression xD.
It would also be nice to keep the syntax of including word lists similar to the existing approach -
include_str!("word_lists/top250")
i.e., through a macro.Host on the internet
A way to avoid embedding the word lists into the binary is to host them somewhere (probaby GitHub?) and have the app download them as required. This will reduce the baggage on the binary, but it's only useful when there are an extremely large number of large word lists. I'm also opposed to the idea of making what is a fully offline app into an internet connected app.
A custom format
This is a crazy one. What if toipe had its own custom format to store word lists in that is optimized only for storing a list of words. It will be a binary format, but I'm not sure of anything else.
Additional context
A change related to binary size was suggested before - #23 (fixed in #24)
To measure the binary size, use: