Open zhangxiaofan-star opened 1 year ago
That's interesting. Are you using the DataParallel model for multiple GPUs? I have trained on up to 4 GPUs before with no issue other than needing to change some code from model.attr to model.module.attr.
However when I have done this, all the GPUs have been on the same HPC node.
Were you ever able to get this resolved?
您好,很感谢您这篇论文的工作,我收获了很多。 我在使用distributed方法进行单机多卡执行时,遇到了如下报错
您可以帮忙解答一下吗