Doubt with the linear combination of embeddings for sentences

paulomann commented 5 years ago

System (please complete the following information):

OS: Linux Ubuntu 18
Python version: 3.6.5
AllenNLP version: 0.8.3
PyTorch version: 1.1.0

Question I'm trying to use the ELMo biLM as a part of a bigger model for my specific task. I'm not sure if what I'm doing is correct, but what I understand is that the Elmo(...) class is a PyTorch module and, therefore, it has trainables parameters, such that in the forward() function, it will train along with all the others parameters I have below, is that right?

from allennlp.modules.elmo import Elmo
class ELMo(nn.Module):

    def __init__(self, fine_tune=False, n_classes=2):
        super(ELMo, self).__init__()

        self.embedding = Elmo(config.PATH_TO_ELMO_OPTIONS,
                              config.PATH_TO_ELMO_WEIGHTS, 1, dropout=0.2,
                              requires_grad=fine_tune)
        n_ftrs = self.embedding.get_output_dim()
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(n_ftrs, n_classes)
        )

    def forward(self, x):
        x = batch_to_ids(x)
        x = self.embedding(x)["elmo_representations"][0]
        # This is where we get the mean of the word embeddings
        x = torch.mean(x, dim=1)
        x = self.fc(x)
        return x

This line of code (x = torch.mean(x, dim=1)) is where I get the means of the word embeddings to get the sentence embedding. And the final fully connected layer is to make the prediction for my task. Is that right to train with the mean of word embeddings before the loss function is even computed? Is the linear combination between all elmo representations being computed?

schmmd commented 5 years ago

@Greenbald someone else might be able to answer your question more directly--but I thought this tutorial (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#using-elmo-with-existing-allennlp-models) might provide some helpful tips.

paulomann commented 5 years ago

I appreciate the answer, @schmmd. However, I quite don't understand how to use the .jsonnet style. The tutorial you mentioned points to a configuration file used in the SRL task, which is not my case, although I appreciate if anyone could point me on how to use the jsonnet scheme to adapt it to other types of architectures, such as the one with a final fully connected layer as in the code I provided above: self.fc.

I would like to use the contextual embeddings, i.e., the softmax weighted representations for a downstream task. It seems logical to me that I could incorporate the Elmo class from allennlp.modules.elmo.Elmo, such that it could be a "layer" in a bigger model, which includes my downstream task, and at the same time, fine-tuned. However, I don't know if it's correct.

schmmd commented 5 years ago

In discussing with the team, your code example looks good but you should be sure to use a mask for when you compute the mean of the words in the sentence.

paulomann commented 5 years ago

I see! You're totally right, I forgot that. Thanks a lot! :)

I also want to say that your team has made an amazing work!

paulomann commented 5 years ago

One last question @schmmd, in the batch_to_ids(sentences) from allennlp.modules.elmo, the sentences needs to be all sentences in my corpus? Or could it be iterating over my batch_size?

joelgrus commented 5 years ago

just one batch of sentences.

allenai / allennlp

Doubt with the linear combination of embeddings for sentences #2848