Nota-NetsPresso / shortened-llm

Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]
63 stars 8 forks source link

CUDA out of memory, when run taylor with llama3-8b #14

Open yaolu-zjut opened 3 months ago

yaolu-zjut commented 3 months ago

Hello, I am very grateful for your detailed open source work. I want to repeat your experiment on llama3-8b, but when I run Taylor's experiment, it appears CUDA out of memory. I use one A100 GPU with 40G memory. could you please provide some solutions.