Closed paulomann closed 5 years ago
@Greenbald someone else might be able to answer your question more directly--but I thought this tutorial (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#using-elmo-with-existing-allennlp-models) might provide some helpful tips.
I appreciate the answer, @schmmd. However, I quite don't understand how to use the .jsonnet style. The tutorial you mentioned points to a configuration file used in the SRL task, which is not my case, although I appreciate if anyone could point me on how to use the jsonnet scheme to adapt it to other types of architectures, such as the one with a final fully connected layer as in the code I provided above: self.fc
.
I would like to use the contextual embeddings, i.e., the softmax weighted representations for a downstream task. It seems logical to me that I could incorporate the Elmo class from allennlp.modules.elmo.Elmo
, such that it could be a "layer" in a bigger model, which includes my downstream task, and at the same time, fine-tuned. However, I don't know if it's correct.
In discussing with the team, your code example looks good but you should be sure to use a mask for when you compute the mean of the words in the sentence.
I see! You're totally right, I forgot that. Thanks a lot! :)
I also want to say that your team has made an amazing work!
One last question @schmmd, in the batch_to_ids(sentences)
from allennlp.modules.elmo
, the sentences needs to be all sentences in my corpus? Or could it be iterating over my batch_size?
just one batch of sentences.
System (please complete the following information):
Question I'm trying to use the ELMo biLM as a part of a bigger model for my specific task. I'm not sure if what I'm doing is correct, but what I understand is that the
Elmo(...)
class is a PyTorch module and, therefore, it has trainables parameters, such that in theforward()
function, it will train along with all the others parameters I have below, is that right?This line of code (
x = torch.mean(x, dim=1)
) is where I get the means of the word embeddings to get the sentence embedding. And the final fully connected layer is to make the prediction for my task. Is that right to train with the mean of word embeddings before the loss function is even computed? Is the linear combination between all elmo representations being computed?