Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Apache License 2.0
6k
stars
520
forks
source link
(documentation) How do I know if generate.py is running on GPU / GPU configuration #449
Hi, I have a NVidia Quadro P5200 with 32GB of VRAM, yet when I run the codes for a test they perform extremely slowly and in the task manager the GPU's used ram stays near 0. I think the code is not using my GPU. Is there special configuration to do beyond pip install -r requirements.txt to get this running on GPU?
Hi, I have a NVidia Quadro P5200 with 32GB of VRAM, yet when I run the codes for a test they perform extremely slowly and in the task manager the GPU's used ram stays near 0. I think the code is not using my GPU. Is there special configuration to do beyond pip install -r requirements.txt to get this running on GPU?