find a larger dataset - Githubissues

A12Studios / semantle-es

Source code for "Semantle en español"

GNU General Public License v3.0

3 stars 0 forks source link

There's some progress here. I found a word2vec dataset that's a better fit for this game.

The downside is that the vectors have 400 dimensions, versus the 300 the current one has.

Due to the html/js component keeping the secret word vector as part of the daily game, switching to the new dataset mid-day breaks the game. Doing a cutover on a new game is an option, however if something goes wrong, rolling back breaks the game. The third option is doing a two-phase deployment: first update the html/js component to handle vectors of 300 and 400 dimensions, wait for the next game (to maximize the player pool that has the new version) and then switch to the new dataset. Done it that sequence the individual changes are rollback safe. A fourth option is finding a better dataset with 300d vectors and just use that. I haven't found one yet.

A12Studios / semantle-es

find a larger dataset #5