Hello! I am fascinated by your great idea and have been experimenting with your code, but I found that there might be some problems with your function of multicard finetuning:
if "--launcher" is set to none and set two or more GPUs like CUDA_VISIBLE_DEVICES=0,1, NaN problems will occur in the first epoch"NaN or Inf found in input tensor"
if "--launcher" is set to "pytorch", errors about environmental variables like "RANK" not defined or "WORLD_SIZE" not define will be raised. In the corresponding block, I found a "TO DO"
Have you met the problem when doing the experiment yourselves? Please tell me how it shall be solved, and how tour DDP can be used? Thanks!
Hello! I am fascinated by your great idea and have been experimenting with your code, but I found that there might be some problems with your function of multicard finetuning: if "--launcher" is set to none and set two or more GPUs like CUDA_VISIBLE_DEVICES=0,1, NaN problems will occur in the first epoch"NaN or Inf found in input tensor" if "--launcher" is set to "pytorch", errors about environmental variables like "RANK" not defined or "WORLD_SIZE" not define will be raised. In the corresponding block, I found a "TO DO"
Have you met the problem when doing the experiment yourselves? Please tell me how it shall be solved, and how tour DDP can be used? Thanks!