From my understanding, the number of models saved in the checkpoints folder should be equal to the number of datasets. In the case of single node training, self.args.rank always remains 0, in which case all the checkpoints get named "bestmodel_{self.args.rank}.pth ", which would always be "bestmodel_0.pth". Am I missing something here? Thank you so much!
From my understanding, the number of models saved in the checkpoints folder should be equal to the number of datasets. In the case of single node training, self.args.rank always remains 0, in which case all the checkpoints get named "bestmodel_{self.args.rank}.pth ", which would always be "bestmodel_0.pth". Am I missing something here? Thank you so much!