ShiqiYu / OpenGait

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.
664 stars 154 forks source link

Why do inference results differ when setting different nproc_per_node #196

Closed light201212 closed 2 months ago

light201212 commented 3 months ago

setting --nproc_per_node=2 or 1, I got different test results. CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=1 opengait/main.py --cfgs ./configs/deepgaitv2/DeepGaitV2_gait3d.yaml --phase test CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 opengait/main.py --cfgs ./configs/deepgaitv2/DeepGaitV2_gait3d.yaml --phase test could you give some help?

updatelse commented 3 months ago

CUDA_VISIBLE_DEVICES=0,1代表可见性,main.py这个程序有权利使用两张显卡,但你设置--nproc_per_node=1致使程序只用了一张卡(卡0)测试,当你--nproc_per_node=2程序会将它看见的卡0和卡1两张卡都用来测试,你用两个卡测试肯定会比一个卡测试指标高,我用八张卡测试比用四张卡测试指标也高,并且测试喂入batch数也会影响指标(作者原始batch数最佳)。上述仅为我个人观点。

light201212 commented 3 months ago

CUDA_VISIBLE_DEVICES=0,1代表可见性,main.py这个程序有权利使用两张显卡,但你设置--nproc_per_node=1致使程序只用了一张卡(卡0)测试,当你--nproc_per_node=2程序会将它看见的卡0和卡1两张卡都用来测试,你用两个卡测试肯定会比一个卡测试指标高,我用八张卡测试比用四张卡测试指标也高,并且测试喂入batch数也会影响指标(作者原始batch数最佳)。上述仅为我个人观点。

同一个模型,同一个样本,推理结果唯一,指标不唯一?