Closed Asma-Alkhaldi closed 1 year ago
Hi, the size of checkpoint should be constant. Please see the first issue, https://github.com/XiYe20/VPTR/issues/1, where I showed that each checkpoint size is the same. It is hard for me to help you to find the problem without more detailed information.
I'm trying to reproduce the results, and I noticed in Stage1: train_AutoEncoder.py that the output checkpoint size is getting bigger and bigger in each epoch.
Epoch1 >> 615 MB Epoch2 >> 1.2 GB Epoch3 >> 2.5 GB Epoch4 >> 4.9 GB Epoch10 >> 160.3 GB
Then I couldn't complete the training because it consumes all the available storage. I'm wondering why the checkpoint size keeps changing and why it is huge ?!