Closed honnibal closed 8 years ago
Thanks! Having GloVe vectors as default would be great.
We've now uploaded a binary formatted version of the GloVe common crawl vectors. We've pruned the vocabulary o 1m entries instead of 3m. I would guess coverage on your task shouldn't be affected, but you'll need to evaluate your trained model to see. If need be, we can ship a full GloVe model for you and other users.
First update your spaCy installation. You'll need v0.100.6 . This will likely update several dependent packages as well.
pip install --upgrade spacy
We've written a little utility to download and install these data assets. You can use it to install the current model, and the GloVe vectors:
$ sputnik —name spacy install en
$ sputnik --name spacy install en_glove_cc_300_1m_vectors
Then in Python, load the en
model with the GloVe
vectors like so:
import spacy
nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
Hey,
Impressive system. I really regret not switching to GloVe vectors a long time ago. Thanks for putting up with the awkwardness of having to install extra vectors etc. We'll get this fixed.
The reason spaCy still ships with the Wikipedia vectors is sort of random. My plan since around May last year was to train POS specific vectors, but I never got around to this, until Trask et al published their sense2vec paper. We finally published a demo on this recently ( https://sense2vec.spacy.io ). I might have a go at using these vectors in your system :).
Soon we'll have a command shipped to install the GloVe vectors. We'll then make these the default, and offer the previous ones as a backwards compatibility pack.