CUDA out of memory @ `util.paraphrase_mining`

PhilipMay commented 2 years ago

Hi,

I am using util.paraphrase_mining on 3,463,703 sentences and a 16 GB GPU:

paraphrases = util.paraphrase_mining(
    model, sentences, 
    show_progress_bar=True,
    batch_size=128, 
)

I am getting a CUDA out of memory error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [7], line 1
----> 1 paraphrases = util.paraphrase_mining(
      2     model, sentences, 
      3     show_progress_bar=True,
      4     batch_size=128, 
      5 #    query_chunk_size=10_000,  # def: 5000
      6 #    corpus_chunk_size=200_000,  # def: 100000
      7 )

File ~/miniconda3/envs/paraphrase-mining/lib/python3.9/site-packages/sentence_transformers/util.py:130, in paraphrase_mining(model, sentences, show_progress_bar, batch_size, *args, **kwargs)
    113 """
    114 Given a list of sentences / texts, this function performs paraphrase mining. It compares all sentences against all
    115 other sentences and returns a list with the pairs that have the highest cosine similarity score.
   (...)
    126 :return: Returns a list of triplets with the format [score, id1, id2]
    127 """
    129 # Compute embedding for the sentences
--> 130 embeddings = model.encode(sentences, show_progress_bar=show_progress_bar, batch_size=batch_size, convert_to_tensor=True)
    132 return paraphrase_mining_embeddings(embeddings, *args, **kwargs)

File ~/miniconda3/envs/paraphrase-mining/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py:195, in SentenceTransformer.encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
    192 all_embeddings = [all_embeddings[idx] for idx in np.argsort(length_sorted_idx)]
    194 if convert_to_tensor:
--> 195     all_embeddings = torch.stack(all_embeddings)
    196 elif convert_to_numpy:
    197     all_embeddings = np.asarray([emb.numpy() for emb in all_embeddings])

RuntimeError: CUDA out of memory. Tried to allocate 9.91 GiB (GPU 0; 15.75 GiB total capacity; 10.95 GiB already allocated; 85.56 MiB free; 11.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am using a model based on xlm-r-distilroberta-base-paraphrase-v1 and the folling packages:

sentence-transformers 2.2.2
torch                 1.12.1
transformers          4.22.2

2022-10-04 16_36_22-04_tk_hilft_… (2) - JupyterLab – Mozilla Firefox

PhilipMay commented 2 years ago

I guess all_embeddings = torch.stack(all_embeddings) should be done on CPU and not on GPU?

https://github.com/UKPLab/sentence-transformers/blob/a8cebb235066cacf533d073b8c8250e5e7b04c3d/sentence_transformers/SentenceTransformer.py#L195

PhilipMay commented 2 years ago

Putting this before the "stack" might fix the bug: all_embeddings = [e.cpu() for e in all_embeddings].

PhilipMay commented 2 years ago

@nreimers the solution above works for me and fixes the issue. I am not 100% sure of the side effects. Is it ok to move all tensors in the list from GPU to CPU?

What do you think? Should I create a PR?

Many thanks Philip

nreimers commented 2 years ago

Hi @PhilipMay sadly it has side effects and it is unclear if you want to have this or not (or: It depends on the use-case if you want to have it or not).

If your GPU has enough memory, you want to keep the tensors on the GPU, because:

Subsequent operations e.g. for semantic search / paraphrase mining / clustering are much faster on the GPU
It can take quite some time to transfer to CPU.

So you only want this line if you run OOM. So maybe some option would be needed.

Also torch.stack currently doubles the need for memory, as it has at some time all old tensors and the new tensors.

Maybe a better solution would be to create the final matrix up-front in the encode method and to write the generated embeddings to this result matrix? Then we wouldn't have overhead of duplicating all embeddings.

Lavriz commented 1 year ago

@nreimers the solution above works for me and fixes the issue. I am not 100% sure of the side effects. Is it ok to move all tensors in the list from GPU to CPU?

What do you think? Should I create a PR?

Many thanks Philip

Hey @PhilipMay! Thank you for providing the fix.

I was wondering whether you encountered an issue like that after using this solution: the task is finished according to the progress bar, but it's still running in Jupyter (having an asterisk)?

PhilipMay commented 1 year ago

I was wondering whether you encountered an issue like that after using this solution: the task is finished according to the progress bar, but it's still running in Jupyter (having an asterisk)?

No. I can not remember something like that.

UKPLab / sentence-transformers

CUDA out of memory @ `util.paraphrase_mining` #1712