Closed geyutang closed 5 years ago
Well actually I also don't quite understand the mechanism that torch.save() adopts, so let's forget about this weird TypeError and simply save the model.state_dict().
Yes there is one thing to pay attention to. This loss has three buffers and this has two buffers. So you should also save the buffer values. When loading the checkpoint, please load it AFTER the Trainer is initialized (because the initialization registers the buffers), e.g. insert the loading code here, and set the buffer values.
Please feel free to let me know if any further problems.
Thanks for your kindly reply. I will try this. This TypeError bothers me for a long time.
You are welcome :)
First, I want to check the right ways to resume the model.
Following the above steps, the error still exists, even downgrade the PyTorch version to 1.0.0(my previous PyTorch version is 1.1.0).
The checkpoint is successfully loaded but fails to save the newer trained checkpoint.
If I have to change the model save method that only save the model.state_dict(), any suggestion about this change? is that I only need to save the model.state_dict() and epoch to the checkpoint? is there any attention need to be paid on other detail?
Thanks for your attention and kind reply.