Open cgallegu opened 2 years ago
There's some progress here. I found a word2vec dataset that's a better fit for this game.
The downside is that the vectors have 400 dimensions, versus the 300 the current one has.
Due to the html/js component keeping the secret word vector as part of the daily game, switching to the new dataset mid-day breaks the game. Doing a cutover on a new game is an option, however if something goes wrong, rolling back breaks the game. The third option is doing a two-phase deployment: first update the html/js component to handle vectors of 300 and 400 dimensions, wait for the next game (to maximize the player pool that has the new version) and then switch to the new dataset. Done it that sequence the individual changes are rollback safe. A fourth option is finding a better dataset with 300d vectors and just use that. I haven't found one yet.
Current one has 1mm words. Some users have reported the scoring experience is too different from the original Semantle: a random word will get you a similarity score of 10.
Maybe with a larger dataset this is not the case?
Or, maybe we need to have a mechanism to pick secret words so that they're challenging enough. Maybe one way to do that is to filter based on the similarity distribution? Like find secret words with more like a power distribution instead of a flat one.