andyzoujm / representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency
https://www.ai-transparency.org/
MIT License
716 stars 86 forks source link

Train large model on multiple GPUs. You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. #45

Closed YerongLi closed 3 months ago

YerongLi commented 3 months ago

I am not able to train larger model on two GPUs, does anyone know how to fix this with deepspeed?

I tried to use llama2_lorra.py with the script llama_lorra_tqa_7b.sh with 4 GPUs. Though I know --num_gpus=1 fixed the issue.

ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=1` or by launching with `python {{myscript.py}}`.
{'tqa_accuracy': 0.31334149326805383, 'arc-e_accuracy': 0.6614035087719298}
deepspeed --master_port $ds_master_port --num_gpus=1 src/llama2_lorra.py
YerongLi commented 3 months ago

I am not able to train larger model on two GPUs, does anyone know how to fix this with deepspeed?

I tried to dispatch large model with my own dispatcher, but received OOM error.

    from accelerate import dispatch_model
    device_map = auto_configure_device_map(4)
    print(device_map)
    model = dispatch_model(model, device_map=device_map)
YerongLi commented 3 months ago

Fixed training with deepspeed