Hardware specification for inference

VinAIResearch / PhoGPT

PhoGPT: Generative Pre-training for Vietnamese (2023)

Apache License 2.0

739 stars 67 forks source link

Closed leviethung2103 closed 10 months ago

leviethung2103 commented 10 months ago

Hello,

The model has been trained on the A100 GPU. However, I am wondering about the GPU memory cost during inference.

Currently, I have a 3060 GPU with 12 VRAM. Can it be used for running inference?

Thank you

datquocnguyen commented 10 months ago

You can just load the model in 8-bit, taking about 7.5GB of memory: load_in_8bit=True https://huggingface.co/docs/transformers/main/en/main_classes/quantization