OOM issue when running the quick start code under 80GB gpu

AkariAsai / self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

https://selfrag.github.io/

MIT License

1.76k stars 162 forks source link

OOM issue when running the quick start code under 80GB gpu #18

Closed chanchimin closed 10 months ago

chanchimin commented 10 months ago

Hello, I appreciate the effort you’ve put into your work!

I’ve been trying to execute your quick start code, but I’ve run into an Out Of Memory (OOM) error, despite having an 80GB GPU at my disposal. I was under the impression that a 7B model would fit comfortably within an 80GB GPU memory, so I’m unsure why I’m still facing this OOM error. Could you possibly shed some light on this issue? Thanks!

from vllm import LLM, SamplingParams
model = LLM("selfrag/selfrag_llama2_7b", download_dir=MY_DIR, dtype="half")

and by the way, can you tell me the typical memory usage when executing this code snippet?

AkariAsai commented 10 months ago

Thank you for your interests! During inference of 7B model, we use a single GPU with 24 GB memory, so I'm not sure why you have got OOM error. Could you try different 7B model e.g., LLama2-7b-hf? If you still get the same OOM error, it might come from the vllm part, and may be better to ask in their Github isssues!

AkariAsai commented 10 months ago

I'm closing this issue now as I am not sure if this comes from the Self-RAG model checkpoints itself, but feel free to reopen it!

chanchimin commented 9 months ago

Thank you for clarifying. I do not encounter the OOM issue now; it might be because someone might have occupied the GPU memory, and I did not notice it.