Build out-of-vocabulary word fom data.bin

Kyubyong / wordvectors

Pre-trained word vectors of 30+ languages

MIT License

2.22k stars 393 forks source link

Build out-of-vocabulary word fom data.bin #19

Open binhna opened 6 years ago

binhna commented 6 years ago

Because the advantage of subword model is that we can create the new words from pre-trained characters, I wonder how can I create a new word vector from the data.bin file. Does that .bin file contain characters and their vectors? Thanks.

adodge commented 6 years ago

The .bin files are fasttext model files. They're slightly out of date, but if you apply the script from https://github.com/Kyubyong/wordvectors/issues/14 you can use the fasttext program to generate word vectors for new words.

binhna commented 6 years ago

Yeah. Thank you, but I seem don't know how to use the script. I have the .bin file and your script and fasttext program, and how exactly I can apply your script to generate new words?

binhna commented 6 years ago

Oh I know it now. The first and second argument in your script is the old and new .bin file respectively. After we got the new .bin file, we can use fasttext to generate a new word embedding. Thanks a lot for your script!

kusumlata123 commented 5 years ago

Hi , I am using hindi language word2vec hi.bin so when i am using my corpus to find vector of word then for some number like 3740 ( ३७४० ) it give out of vocabulary. what should i do for this.