OOM when running Multi-gpu inference using A100 40GB

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

702 stars 180 forks source link

OOM when running Multi-gpu inference using A100 40GB #168

Closed phqtuyen closed 1 week ago

phqtuyen commented 7 months ago

Has anyone faced the issue of OOM when running the inference using A100 40GB? The OOM error occur when the process try to load the model shard into GPU memory. Much appreciated.

loubnabnl commented 7 months ago

Hi, which model are you trying to evaluate and what's your execution command and accelerate config? For instance try decreasing batch size/seq length or using fp16 or bf16 precision or even load_in_8bit/4bit to save some memory