CUDA not actually being used on RepresentationModel

rodrigogoulartsilva commented 3 years ago

Hi folks,

First of all, awesome job with the library. I have been looking into the RepresentationModel class and I noticed even though there is an init parameter to use CUDA, you assign the device to self.device but the model is never moved to CUDA and you never use it in the method encode_sentences, so it defaults to CPU.

I was able to fix it by simply changing these snippets:

Adding the following to init anywhere after self.model is assigned:

self.model.to(self.device)

And changed the encode_sentences method to include device and some adjustments to return embeddings as a numpy vector. Here is the working method:

def encode_sentences(self, text_list, combine_strategy=None, batch_size=32): """ Generates list of contextual word or sentence embeddings using the model passed to class constructor :param text_list: list of text sentences :param combine_strategy: strategy for combining word vectors, supported values: None, "mean", "concat" :param batch_size :return: list of lists of sentence embeddings(if combine_strategy=None) OR list of sentence embeddings(if combine_strategy!=None) """ batches = batch_iterable(text_list, batch_size=batch_size) embeddings = np.array([]) for batch in tqdm(batches): input_ids_tensor = self._tokenize(batch) input_ids_tensor = input_ids_tensor.to(self.device) #Added device assignment to tensor
```
    token_vectors = self.model(input_ids=input_ids_tensor)
    if combine_strategy:
        embedding_func_mapping = {"mean": mean_across_all_tokens, "concat": concat_all_tokens}
        if embedding_func_mapping[combine_strategy]:
            embedding_func = embedding_func_mapping[combine_strategy]
        else:
            raise ValueError(
                "Provided combine_strategy is not valid." "supported values are: 'concat', 'mean' and None."
            )
        batch_embeddings = embedding_func(token_vectors).detach() 
    else:
        batch_embeddings = token_vectors.detach()
    if len(embeddings) == 0:
        embeddings = batch_embeddings.cpu().detach().numpy()
    else:
        embeddings = np.concatenate((embeddings, batch_embeddings.cpu().detach().numpy()), axis=0)

return embeddings
```

Feel free to make the changes if you find appropriate.

Thank you once again.

ThilinaRajapakse commented 3 years ago

Would you like to submit this as a PR so that you get added as a contributor?

I think the model should be moved to device when encode_sentences() is called (for consistency with other models and also so that GPU memory is not taken up if the model is not used).

rodrigogoulartsilva commented 3 years ago

Of course! It would be awesome to contribute to such a cool project! Pull request created as requested. Please let me know if you think it needs adjustments.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ThilinaRajapakse / simpletransformers

CUDA not actually being used on RepresentationModel #807