Closed YerongLi closed 3 months ago
I am not able to train larger model on two GPUs, does anyone know how to fix this with deepspeed?
I tried to dispatch large model with my own dispatcher, but received OOM error.
from accelerate import dispatch_model
device_map = auto_configure_device_map(4)
print(device_map)
model = dispatch_model(model, device_map=device_map)
Fixed training with deepspeed
I am not able to train larger model on two GPUs, does anyone know how to fix this with deepspeed?
I tried to use llama2_lorra.py with the script llama_lorra_tqa_7b.sh with 4 GPUs. Though I know --num_gpus=1 fixed the issue.