avisingh599 / visual-qa

[Reimplementation Antol et al 2015] Keras-based LSTM/CNN models for Visual Question Answering
https://avisingh599.github.io/deeplearning/visual-qa/
MIT License
481 stars 186 forks source link

We'll fix the spaCy vectors (make GloVe the default) #17

Closed honnibal closed 8 years ago

honnibal commented 8 years ago

Hey,

Impressive system. I really regret not switching to GloVe vectors a long time ago. Thanks for putting up with the awkwardness of having to install extra vectors etc. We'll get this fixed.

The reason spaCy still ships with the Wikipedia vectors is sort of random. My plan since around May last year was to train POS specific vectors, but I never got around to this, until Trask et al published their sense2vec paper. We finally published a demo on this recently ( https://sense2vec.spacy.io ). I might have a go at using these vectors in your system :).

Soon we'll have a command shipped to install the GloVe vectors. We'll then make these the default, and offer the previous ones as a backwards compatibility pack.

avisingh599 commented 8 years ago

Thanks! Having GloVe vectors as default would be great.

honnibal commented 8 years ago

We've now uploaded a binary formatted version of the GloVe common crawl vectors. We've pruned the vocabulary o 1m entries instead of 3m. I would guess coverage on your task shouldn't be affected, but you'll need to evaluate your trained model to see. If need be, we can ship a full GloVe model for you and other users.

First update your spaCy installation. You'll need v0.100.6 . This will likely update several dependent packages as well.

pip install --upgrade spacy

We've written a little utility to download and install these data assets. You can use it to install the current model, and the GloVe vectors:

$ sputnik —name spacy install en
$ sputnik --name spacy install en_glove_cc_300_1m_vectors

Then in Python, load the en model with the GloVe vectors like so:

import spacy

nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')