Closed MrBoor closed 6 years ago
That would be very helpful for me as well
I would also very much appreciate it if you could publish the binary model. Thanks!
Yes it would be very useful
For the english link you post above, they only contain the word vectors, not the model .bin files, which is what we are asking for.
With the model files, we can create out of vocabulary word vectors, but we can't do that with the word vectors only.
Also interested in this. The bin files for english would be very valuable.
I would also be interested in the binary vectors.
Is there a reason why the .bin file will not be made open to the public?
It would be really helpful to be able to generate OOV word vectors for English words, but without the .bin file this would not be possible.
I found a link to an English .bin in the comments of #494: https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki-news-300d-1M-subword.bin.zip
Thank you maxfriedrich.
However, I think most of us would like to see the .bin file on the Common Crawl corpus. The link you provided only contains the vectors trained on the Wikipedia and News, but not on the Common Crawl.
I'm currently working on text classification tasks on Tweets, so it would be nice to have the Common Crawl vectors. Hope it will be published later.
Any update on this? I hope an admin at least assign someone to answer our queries...
This is indeed strange. For non English languages, the common crawl binaries are available but for English (which is most widely used) it is missing?
Just check in back to see if there is any plan to release the common crawl version of binaries for English. Any update?
just popping this up. checking if we could bet the binaries for commoncrawl
Hi all,
Thank you for raising this issue.
The model trained on the common crawl data did not use subwords, and thus the binary model would not contain anymore information compared to the text file that we released. In particular, this binary model could not be used to compute representation for out of vocabulary words. This is the reason why we did not release the binary model.
We will likely release a model trained on crawl data with subwords in the near future (both binary and text models will be released).
Best, Edouard.
@EdouardGrave Hi Edo, Any update on the sub-word model trained on the common crawl?
Hello! I enjoy using your library and pretrained vectors. I see that for vectors that were trained on wiki you provide both binary model and pretrained vectors. However, for vectors that were trained on Common crawl, you only provide pretrained vectors. Is it possible for you to publish binary model for them?
Thanks, Alexander.