This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357)
MIT License
1.35k
stars
272
forks
source link
Loss is NaN when using Mixed-precision training #11
Hi, I'm trying to reproduce your results but meet some problems. The configuration file I'm using is w48_640_adam_lr1e-3.yaml, which uses Mixed-precision training. During the training I found many NaNs occured, such as:
I'm training using 2 NVIDIA Tesla V100 GPU cards, so I modify IMAGES_PER_GPU from 10 to 20. Any other configuration/code remain the same. What really caused so many NaNs makes me very confused, hope for your help, thanks! @bowenc0221 @leoxiaobin
Hi, I'm trying to reproduce your results but meet some problems. The configuration file I'm using is
w48_640_adam_lr1e-3.yaml
, which uses Mixed-precision training. During the training I found many NaNs occured, such as:I'm training using 2 NVIDIA Tesla V100 GPU cards, so I modify
IMAGES_PER_GPU
from 10 to 20. Any other configuration/code remain the same. What really caused so many NaNs makes me very confused, hope for your help, thanks! @bowenc0221 @leoxiaobin