First of all, thank you for your great work.
Currently, I'm using your code to train on another dataset and face the problem described in the attached image.
After install APEX successfully, I run training code with 2 A6000 GPUs and it stops after few epochs.
Another problem I want to ask is that my server keeps restarting when I ran the training code for several times before I could. Do you know the reason why?
Thank you very much
First of all, thank you for your great work. Currently, I'm using your code to train on another dataset and face the problem described in the attached image. After install APEX successfully, I run training code with 2 A6000 GPUs and it stops after few epochs. Another problem I want to ask is that my server keeps restarting when I ran the training code for several times before I could. Do you know the reason why? Thank you very much