Open AAA2023AAA opened 7 months ago
It's difficult to say without seeing the full code. Could you share your full code? Also, which GPU did you select?
I use A100. I tried to use the same code that you have provided for Arvix papers and LLAMA 2 : 'meta-llama/Llama-2-13b-chat-hf' https://towardsdatascience.com/topic-modeling-with-llama-2-85177d01e174
You did not change a single thing? For the sake of simplicity, could you still directly copy-paste your code? You never know what the issue might be, even when preparing your dataset!
Lastly, when exactly do you get the out-of-memory error? Is that during .fit
?
By the way, it might also be worthwhile to checkout the official documentation for a number of other examples running LLMs with BERTopic. There are also a couple of pre-quantized models explored: https://maartengr.github.io/BERTopic/getting_started/representation/llm.html#zephyr-mistral-7b
I have only changed the dataset cell: import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/Article_body.csv") docs = df["Article Body"].fillna('').astype(str)
docs = df["Article Body"].fillna('').astype(str)
First, make sure that your docs
are a list of strings and not a pandas series. Second, when exactly do you get the out-of-memory error? Is that during .fit
? If so, which steps were already completed before giving the error? You can see this in the logs when running .fit
.
yes I use: docs = docs.tolist()
OutOfMemoryError Traceback (most recent call last)
28 frames /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in softmax(input, dim, _stacklevel, dtype) 1856 ret = input.softmax(dim) 1857 else: -> 1858 ret = input.softmax(dim, dtype=dtype) 1859 return ret 1860
OutOfMemoryError: CUDA out of memory. Tried to allocate 3.05 GiB. GPU 0 has a total capacty of 15.77 GiB of which 2.49 GiB is free. Process 11204 has 13.28 GiB memory in use. Of the allocated memory 8.85 GiB is allocated by PyTorch, and 3.20 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
In that case, it might be worthwhile to try out a quantized model like the link I shared above. It seems that it needs more VRAM than anticipated for your use case, so either increasing VRAM (e.g., 24GB) or using a smaller model would be the fix for you.
Hi Maarten,
Thank you for the excellent work done with bertopic.
Until now, I had been using Bertopic v0.15 with Llama2 for representation, and it was working very well. However, I decided to upgrade to v0.16 to test the new functionalities, such as zero-shot modeling.
Now, with the same code and data, I encounter an OutOfMemoryError (CUDA out of memory). The error message is as follows: "Tried to allocate 3.38 GiB. GPU 0 has a total capacity of 11.99 GiB, of which 1.02 GiB is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory, 8.93 GiB is allocated by PyTorch, and 57.52 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large, try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF."
Do you have any idea what could be causing this change in memory usage between versions 0.15 and 0.16?
@Virginie74 There were no changes with respect to memory issues between those two versions. My guess is, especially with LLMs, that the underlying package that loads in the model was updated (or the LLM itself). For instance, it may happen that new versions of transformers, sbert, etc. take in a bit more VRAM that causes the OOM errors. I would advise use quantized models instead.
I'm facing the same issue using Llama 2, using the same example on https://colab.research.google.com/drive/1QCERSMUjqGetGGujdrvv_6_EeoIcd_9M?usp=sharing , using Bertopic 0.16
but if I change !pip install bertopic datasets accelerate bitsandbytes xformers adjustText to !pip install bertopic==0.15 datasets accelerate bitsandbytes xformers adjustText
works without problem
@crookedreyes I just checked the changelogs again and the only thing that was changed that might affect CUDA memory was the automatic truncation of documents. If you set doc_lenght=100
and tokenizer="char"
, do you then still get this issue?
@crookedreyes I just checked the changelogs again and the only thing that was changed that might affect CUDA memory was the automatic truncation of documents. If you set
doc_lenght=100
andtokenizer="char"
, do you then still get this issue?
@MaartenGr Thank you for your response and help.
It worked! , I just added the parameters into the textgenerator()
llama2 = TextGeneration(generator, prompt=prompt,doc_length=10,tokenizer="char")
Hello,
I'm Ahmad who asked you in Linked. Thank you for Response. I have colab pro plus. I try to print statistics about my dataset: count 21006.000000 mean 490.140341 std 217.119297 min 1.000000 25% 340.000000 50% 471.000000 75% 610.000000 max 3163.000000 Name: article_body_length, dtype: float64
How to solve out of memory issues? Thank you in advance.