HandsOnLLM / Hands-On-Large-Language-Models

Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
https://www.llm-book.com/
Apache License 2.0
2.27k stars 432 forks source link

chapter 6 - OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU #7

Open amscosta opened 1 month ago

amscosta commented 1 month ago

Hello, From the very beginning of chapter 6, when trying to running the jupyter notebook locally with my 8GB VRAM gpu card:

Load model and tokenizer

model = AutoModelForCausalLM.from_pretrained( "microsoft/Phi-3-mini-4k-instruct", device_map="cuda", torch_dtype="auto", trust_remote_code=True, )

Is resulting in the message : OutOfMemoryError: CUDA out of memory. Tried to allocate... Any workaround is very welcome, for instance a less robust model with almost similar results? Thanks.

jalammar commented 1 month ago

Might wanna try a smaller model like Gemma 2B