dbiir / UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
https://github.com/dbiir/UER-py/wiki
Apache License 2.0
3k stars 526 forks source link

多GPU训练出错,请问这是什么原因呢 #23

Open luluforever opened 5 years ago

luluforever commented 5 years ago

AttributeError: module 'torch.distributed' has no attribute 'init_process_group'

luluforever commented 5 years ago

还有一个问题是,如何设置预训练的epoch啊?

zhezhaoa commented 5 years ago

Sorry for the late response. When you pre-train on multiple machines, make sure that your pytorch versions are the same.

In fact, most pre-training works report their training steps instead of epochs. By now UER doesn't include epoch option.