I found that if multi-GPU is used for training, key-value asymmetry will occur during load.
maybe the load part need to add this code inside。
saved_state_dict = data['model']
model = self.accelerator.unwrap_model(self.model)
new_state_dict= {}
for k,v in saved_state_dict.items():
name=k[7:]
new_state_dict[name] = v
if hasattr(model, 'module'):
model.module.load_state_dict(new_state_dict)
else:
model.load_state_dict(new_state_dict)
I found that if multi-GPU is used for training, key-value asymmetry will occur during load. maybe the load part need to add this code inside。 saved_state_dict = data['model'] model = self.accelerator.unwrap_model(self.model) new_state_dict= {} for k,v in saved_state_dict.items(): name=k[7:] new_state_dict[name] = v if hasattr(model, 'module'): model.module.load_state_dict(new_state_dict) else: model.load_state_dict(new_state_dict)