Closed xvxv1024 closed 2 years ago
Hi, for train_AutoEncoder.py, it is defined in Line 143: gan_loss = GANLoss('vanilla', target_real_label=1.0, target_fake_label=0.0).to(device)
For train_FAR and train_NAR, the default settings do not use gan loss, so we comment it.
Thank you for your help! I am a new graduate student, I have another question, why the weight file "data.pkl" is getting bigger and bigger, because I used predRNN before, the size of the file does not change.
You are welcome. I am sorry that I don't know the "data.pkl" file you are talking about, the trained models or checkpoints should be saved as ".tar" file.
Thank you for your answer. I'm really sorry that I keep bothering you, because your paper is very helpful to me, so I want to study carefully. I found that the file ".tar "was getting bigger, so I decompressed the file, and then found that the file" data.pkl "in the compressed file was getting bigger. The ".tar" file is generated when I execute "pythontrain_AutoEncoder. Py"
Hi, I've never encountered your problem, the size of different checkpoint files should be the same across different epochs (see the attached screenshot). The ".tar" files are automatically saved by PyTorch, please read the official documentation for any information about the decompressed files.
Hi, I've never encountered your problem, the size of different checkpoint files should be the same across different epochs (see the attached screenshot). The ".tar" files are automatically saved by PyTorch, please read the official documentation for any information about the decompressed files.
Could you please send me your train_AutoEncoder.py class? It's possible that a slight change in the code may have caused the issue we encountered.
Hi, to ensure the reproducibility of the code, the checkpoint save function automatically saves the code files for each epoch: https://github.com/XiYe20/VPTR/blob/b876364ee19100dccde35ef402bcc2fb1930fdf1/utils/train_summary.py#L135. I suspect that your "ckpt_save_dir" is in the same directory as the train_AutoEncoder.py file, which means all the previous checkpoints would also be saved in the following epochs, and the size of checkpoint files are growing larger and larger. Could you please check this? Thank you very much.