Train large model on multiple GPUs. You can't train a model that has been loaded with `device_map='auto'` in any distributed mode.

YerongLi commented 3 months ago

I am not able to train larger model on two GPUs, does anyone know how to fix this with deepspeed?

I tried to use llama2_lorra.py with the script llama_lorra_tqa_7b.sh with 4 GPUs. Though I know --num_gpus=1 fixed the issue.

ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=1` or by launching with `python {{myscript.py}}`.
{'tqa_accuracy': 0.31334149326805383, 'arc-e_accuracy': 0.6614035087719298}

deepspeed --master_port $ds_master_port --num_gpus=1 src/llama2_lorra.py

YerongLi commented 3 months ago

I am not able to train larger model on two GPUs, does anyone know how to fix this with deepspeed?

I tried to dispatch large model with my own dispatcher, but received OOM error.

    from accelerate import dispatch_model
    device_map = auto_configure_device_map(4)
    print(device_map)
    model = dispatch_model(model, device_map=device_map)

YerongLi commented 3 months ago

Fixed training with deepspeed

andyzoujm / representation-engineering

Train large model on multiple GPUs. You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. #45