facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions
Other
4.48k stars 898 forks source link

Changing the embedding dimension size #176

Closed Arjunsankarlal closed 6 years ago

Arjunsankarlal commented 6 years ago

Hi,

Instead of Glove and FastText word embeddings I am using ELMo, where the ELMo generates the embedding of size either 512 or 1024. I have made few modifications in the load_embeddings function in /reader/train.py file. So to train the new model I gave the parameter as --embedding-dim 512. But still it is throwing error as

RuntimeError: The expanded size of the tensor (300) must match the existing size (512) at non-singleton dimension 0

It works well till conversion of the vector to tensor. The error occurs at,

embedding[self.word_dict[word]].copy_(vec)

Also I replaced the default embedding-dim parameter value from 300 to 512. Any lead would be helpful.

ajfisch commented 6 years ago

I'm a bit confused as to how you're approaching this. The code here is set up to read in non-contextual embeddings to populate a lookup table. ELMo is contextual -- it should be used to embed sentences. It's not hard to set this up as an alternative to the nn.Embedding() using the AllenNLP interface.

Anyways, the error seems to suggest that you are loading 300D vectors from an embedding file, but the code expects 512D ones. So maybe you are still using glove. Also did you override some of the checks in train.py? They should have prevented this from happening.

Arjunsankarlal commented 6 years ago

Hey @ajfisch,

Thanks for spotting it out clearly! I was missing that check. Now it started training. I agree that ELMo generates contextual based vectors, it produces three vectors for every word in the sentence, where the first vector doesn't vary with the context. Hence trying to use them and see the model's performance with Glove and FastText.

ajfisch commented 6 years ago

Sure, the first vector will just be the output of the non-contextual character CNNs. I doubt this will give you much benefit over just Glove or Fasttext (maybe concatenating will give +1 F1, which is consistent with models that have added character level info in literature).

The real improvement will be with the full ELMo representations (+~5 F1).