Open indigopyj opened 2 years ago
same here
Hi @indigopyj , @AiueoABC ,
Did either of you find a solution to this issue?
I discovered the issue is that the new Adam optimizer has parameters of lengths [239], but the loaded optimizer has parameter lengths of [239, 18].
The additional 18 parameters are added in in BaseTrainer, for training the reID classification layers:
I guess the simple way to get it to work without shifting around a bunch of code would be to just drop these additional params during loading.
I achieved this by adding the following line before loading the optimizer in /src/lib/models/model.py
del checkpoint['optimizer']['param_groups'][1]
EDIT:
Sorry no, we should not delete the additional parameters. But we can simply shift load model to after the instantiation of the trainer in train.py. This lets the trainer add the params to the optimizer before loading the saved state dict.
I fine-tuned the model by my custom dataset. Then I wanted to resume training by
python ./src/train.py --resume
but I've got this issue
ValueError: loaded state dict has a different number of parameter groups
I didn't change the structure and the number of classes or any others about training.
How can I resume training? Plz help me