分布式训练 - Githubissues

Spico197 / DocEE

🕹️ A toolkit for document-level event extraction, containing some SOTA model implementations.

https://doc-ee.readthedocs.io/

MIT License

232 stars 36 forks source link

Closed WindSearcher closed 11 months ago

WindSearcher commented 11 months ago

您好，想请问下关于分布式训练ptpcg，如下图，在0和7卡上运行，但实际好像只在0卡上训练

WindSearcher commented 11 months ago

看到了，run_ptpcg.sh可以参考train_multi.sh中的分布式训练

WindSearcher commented 11 months ago

`# CUDA_VISIBLE_DEVICES=${GPUS} python -u run_dee_task.py \

CUDA_VISIBLE_DEVICES=${GPUS} python -m torch.distributed.launch --master_port=25662 --nproc_per_node ${NUM_GPUS} run_dee_task.py ` 把原先的注释加上新的这一行即可

Spico197 commented 11 months ago

最后还要加个 --parallel_decorate flag