OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4.5k stars 464 forks source link

单机单卡训练完一个epoch会报错结束训练 #151

Closed Bothgone closed 1 year ago

Bothgone commented 1 year ago

如题,已按照教程修改分布式训练参数,但是每次训练完一个epoch就会StopIteration Exception in thread ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13312)。 请问我该如何调整

Bothgone commented 1 year ago

找到问题了,是我修改的地方导致的,已经改好了

Claire-YC commented 12 months ago

找到问题了,是我修改的地方导致的,已经改好了 您好,我遇到了同样的问题。请问您当时是修改了什么?谢谢