Closed chanchimin closed 10 months ago
Thank you for your interests! During inference of 7B model, we use a single GPU with 24 GB memory, so I'm not sure why you have got OOM error. Could you try different 7B model e.g., LLama2-7b-hf? If you still get the same OOM error, it might come from the vllm part, and may be better to ask in their Github isssues!
I'm closing this issue now as I am not sure if this comes from the Self-RAG model checkpoints itself, but feel free to reopen it!
Thank you for clarifying. I do not encounter the OOM issue now; it might be because someone might have occupied the GPU memory, and I did not notice it.
Hello, I appreciate the effort you’ve put into your work!
I’ve been trying to execute your quick start code, but I’ve run into an Out Of Memory (OOM) error, despite having an 80GB GPU at my disposal. I was under the impression that a 7B model would fit comfortably within an 80GB GPU memory, so I’m unsure why I’m still facing this OOM error. Could you possibly shed some light on this issue? Thanks!
and by the way, can you tell me the typical memory usage when executing this code snippet?