CGuangyan-BIT / PointGPT

[NeurIPS 2023] PointGPT: Auto-regressively Generative Pre-training from Point Clouds
MIT License
195 stars 20 forks source link

Problem in Multi card finetune #18

Open liuyifan22 opened 10 months ago

liuyifan22 commented 10 months ago

Hello! I am fascinated by your great idea and have been experimenting with your code, but I found that there might be some problems with your function of multicard finetuning: if "--launcher" is set to none and set two or more GPUs like CUDA_VISIBLE_DEVICES=0,1, NaN problems will occur in the first epoch"NaN or Inf found in input tensor" if "--launcher" is set to "pytorch", errors about environmental variables like "RANK" not defined or "WORLD_SIZE" not define will be raised. In the corresponding block, I found a "TO DO"

Have you met the problem when doing the experiment yourselves? Please tell me how it shall be solved, and how tour DDP can be used? Thanks!

CGuangyan-BIT commented 6 months ago

Hi! Data parallel should be feasible; you can give it a try