FreedomIntelligence / HuatuoGPT-II

HuatuoGPT2, One-stage Training for Medical Adaption of LLMs. (An Open Medical GPT)
370 stars 60 forks source link

Issue when run the training script. "ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=1` or by launching with `python {{myscript.py}}`." #32

Open litsh opened 5 months ago

litsh commented 5 months ago

I am running the train.sh under an environment that installed all packages by

pip install -r requirements.txt

But it gives error like below:

Traceback (most recent call last):
  File "train_huatuo.py", line 265, in <module>
    train(args)
  File "train_huatuo.py", line 145, in train
    model, optimizer, train_dataloader,  lr_scheduler = accelerator.prepare(model, optimizer, train_dataloader, lr_scheduler)
  File "/fdudata/tsli/HuatuoGPT-II/huatuo2/lib/python3.8/site-packages/accelerate/accelerator.py", line 1250, in prepare
    raise ValueError(
ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=1` or by launching with `python {{myscript.py}}`.

And I have changed the "--num_processes" flag to 1. But it still gives the same error. Is there any suggestion for solving this problem?