IrvingMeng / MagFace

MagFace: A Universal Representation for Face Recognition and Quality Assessment, CVPR2021, Oral
Apache License 2.0
618 stars 86 forks source link

Saving checkpoint failed #48

Open ghost opened 2 years ago

ghost commented 2 years ago

Hi, when I train the model with the script run_dist.sh, the checkpoint can not be saved when the first epoch finishes. However, the gpu is still occupied and there is no more logs. Any advise ?