csxmli2016 / MARCONet

Learning Generative Structure Prior for Blind Text Image Super-resolution [CVPR 2023]
Other
192 stars 14 forks source link

Multi-GPU training #19

Open wojiaoyanmin opened 6 months ago

wojiaoyanmin commented 6 months ago

Hi, Multi-GPU training has been consistently failing. Would it be possible to provide a screenshot of 'pip list' to see the version of each package installed, or if there is an environment image file available?

csxmli2016 commented 6 months ago

Hi, Multi-GPU training has been consistently failing. Would it be possible to provide a screenshot of 'pip list' to see the version of each package installed, or if there is an environment image file available?

Hi, you can show me the error you have. You can refer to the package that I use. s1 s2

wojiaoyanmin commented 6 months ago

THX, The problem I encountered is in multi-node, multi-GPU training. Single gpu training is Fine. image

csxmli2016 commented 6 months ago

THX, The problem I encountered is in multi-node, multi-GPU training. Single gpu training is Fine. image

I am not sure about this problem. Maybe you can check whether the number of GPU IDs in CUDA_VISIBLE_DEVICES equals to the parameter nproc_per_node.