bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
825 stars 219 forks source link

OOM when running Multi-gpu inference using A100 40GB #168

Closed phqtuyen closed 5 months ago

phqtuyen commented 1 year ago

Has anyone faced the issue of OOM when running the inference using A100 40GB? The OOM error occur when the process try to load the model shard into GPU memory. Much appreciated.

loubnabnl commented 1 year ago

Hi, which model are you trying to evaluate and what's your execution command and accelerate config? For instance try decreasing batch size/seq length or using fp16 or bf16 precision or even load_in_8bit/4bit to save some memory