Closed ZhugeKongan closed 2 years ago
update_old_model_params will be in use when train.py's --resume-from option is provided to resume from previous checkpoint and the checkpoint was created using an earlier version of repository. The implementation handles 'module.' prefix in the state dictionary items if DataParallel is in use with multiple GPU systems and not exist in single GPU systems.
I could not reproduce an evaluation problem in a multiple GPU system:
A sample model is trained in multi-GPU system by: ./scripts/train_mnist.sh The trained model is evaluated in the same system by: ./scripts/evaluate_mnist.sh The trained model (/ai8x-synthesis/trained/ai85-mnist-qat8-q.pth.tar) is copied to (same location) a single GPU system and again evaluated by ./scripts/evaluate_mnist.sh.
Could not reproduce any evaluation error in multi GPU systems. Closing the issue.
I want to know what is the purpose of update_old_model_params in train.py? ` elif args.load_model_path:
print('2222')
This can lead to incorrect parameter loading when using multi-GPU training. This may require optimization.