attardi / deepnl

Deep Learning for Natural Language Processing
GNU General Public License v3.0
457 stars 116 forks source link

vocab.txt & vectors.txt structure #45

Open RaBa01 opened 7 years ago

RaBa01 commented 7 years ago

hi i build my model with gensim Word2Vec and i want to learn my model with NER. is there any document to explain vocab.txt and vectors.txt structure?

i used this script: bin/dl-ner.py ner.dnn -t train+dev \ —vocab vocab.txt —vectors vectors.txt \ —caps —suffix —suffixes —gazetteer eng.list \ -e 40 -l 0.01 -w 5 -n 300 -v

kjyong commented 6 years ago

I tried to analyze the code and converted gensim word2vec model to input for deepnl. The format I tried is like below, and it works. [vocab.txt]

word1
word2
word3
...

[vectors.txt]

2 3
word1 1.0 2.0 3.0
word2 2.3 3.5 1.2
...

the first line in vectors.txt means number of words and their dimension.