Closed dhruvbatra closed 8 years ago
spaCy, the library I used for the NLP part, takes care of the lack of space. Its tokenization takes care of questions marks, apostrophes, lowering the case etc.
Hmm. True. Then there's some else going on because I am seeing the last vector as all zero.
Closing for now.
The last all zeros corresponds to the question mark.
Got it. It feels like an odd representation of that symbol, but thanks.
It's possible I'm mistaken, but it seems there's a bug in the way word embeddings are being computed in own_image.py
If question = "what color is the cat?"
the word "cat?" will considered out of vocabulary (due to lack of space before question mark) and word embeddings will be an all zero vector.