Some problems encountered when reproducing

umichyujl commented 1 year ago

Hi Zeyu! This is Yujie Liu from University of Michigan, my group is trying to reproduce your paper for our Machine Learning class final project, and we encountered some problems in your RGB code:

Which, and how many photos did you use for the first training process in the train.py file? In the paper, I found that you trained $G_d$, $G_p$, $G_h$ on COCO2014 which is a vast database, but in appendix D the image samples are from in KODAK dataset which is relatively small. Is KODAS here behaving as a test case?
For the train2.py, which should be the stage-2 training, the first question is that we can not generate ./model/MMSE_model.pt after running stage-1 training, and your statement about the training process is inconsistent in readme.txt and /src/README.md which is the User Guide. If I follow the part in User Guide here:

Then use the checkpoint of the trained base model to 'warmstart' the GAN architecture. Training the generator and discriminator from scratch was found to result in unstable training, but YMMV.
```
# Train using full generator-discriminator loss
python3 train.py --model_type compression_gan --regime low --n_steps 1e6 --warmstart --ckpt path/to/base/checkpoint
```
I will get an error saying: Traceback (most recent call last): File "/content/gdrive/MyDrive/RGBcode/train.py", line 328, in model_type=args.model_type, current_args_d=dictify(args), strict=False, prediction=False) File "/content/gdrive/MyDrive/RGBcode/src/helpers/utils.py", line 189, in load_model loaded_args_d = checkpoint['args'] KeyError: 'args'

Basically, I am thinking maybe something goes wrong at your checkpoint because it does not have the key 'args', but I am not sure.

We hope you can help us with these issues! Thanks!

Claire-YC commented 1 year ago

Hi Zeyu! My name is Yang Cao from University of Michigan, another group member with Yujie. Great work on your paper. However, I encountered some problems when reproducing the MNIST part.

I cannot get the figure-2 of the original paper when training multiple models with variant $\beta$. To be more specific, only $\beta=0$ gives the MSE(Gp) equals to MSE(Gd) and any other $\beta \in (0,0.5)$ turns out to be MSE(Gp) = 2MSE(Gd). I tried to tune hyperparameters but it doesn't work and I does not change any part of the codes.
For figure3, which exact DAL model are you using?

Any advice would be helpful. Thank you so much.

ZeyuYan commented 1 year ago

Hi Zeyu! This is Yujie Liu from University of Michigan, my group is trying to reproduce your paper for our Machine Learning class final project, and we encountered some problems in your RGB code:

Which, and how many photos did you use for the first training process in the train.py file? In the paper, I found that you trained Gd, Gp, Gh on COCO2014 which is a vast database, but in appendix D the image samples are from in KODAK dataset which is relatively small. Is KODAS here behaving as a test case?

For the train2.py, which should be the stage-2 training, the first question is that we can not generate ./model/MMSE_model.pt after running stage-1 training, and your statement about the training process is inconsistent in readme.txt and /src/README.md which is the User Guide. If I follow the part in User Guide here:

Then use the checkpoint of the trained base model to 'warmstart' the GAN architecture. Training the generator and discriminator from scratch was found to result in unstable training, but YMMV.
# Train using full generator-discriminator loss
python3 train.py --model_type compression_gan --regime low --n_steps 1e6 --warmstart --ckpt path/to/base/checkpoint
I will get an error saying: Traceback (most recent call last): File "/content/gdrive/MyDrive/RGBcode/train.py", line 328, in model_type=args.model_type, current_args_d=dictify(args), strict=False, prediction=False) File "/content/gdrive/MyDrive/RGBcode/src/helpers/utils.py", line 189, in load_model loaded_args_d = checkpoint['args'] KeyError: 'args'

Basically, I am thinking maybe something goes wrong at your checkpoint because it does not have the key 'args', but I am not sure.

We hope you can help us with these issues! Thanks!

The training dataset was COCO2014 and testing dataset was KODAK. The setting of training process was almost the same as in HiFiC. We test on KODAK because the number of images is relatively small and it would take too much time calculating PI score if test dataset was really large.
After stage-1 training, there should be a folder "experiments" that is where the MMSE_model saved.

ZeyuYan commented 1 year ago

Hi Zeyu! My name is Yang Cao from University of Michigan, another group member with Yujie. Great work on your paper. However, I encountered some problems when reproducing the MNIST part.

I cannot get the figure-2 of the original paper when training multiple models with variant β. To be more specific, only β=0 gives the MSE(Gp) equals to MSE(Gd) and any other β∈(0,0.5) turns out to be MSE(Gp) = 2MSE(Gd). I tried to tune hyperparameters but it doesn't work and I does not change any part of the codes.

For figure3, which exact DAL model are you using?

Any advice would be helpful. Thank you so much.

I have edited the trian.py and you can try again. For figure3, we just set the loss function as L=L_mse+λ*L_adv as in On perceptual lossy compression: The cost of perceptual reconstruction and an optimal training framework

umichyujl commented 1 year ago

Hi Zeyu! This is Yujie Liu from University of Michigan, my group is trying to reproduce your paper for our Machine Learning class final project, and we encountered some problems in your RGB code:

Which, and how many photos did you use for the first training process in the train.py file? In the paper, I found that you trained Gd, Gp, Gh on COCO2014 which is a vast database, but in appendix D the image samples are from in KODAK dataset which is relatively small. Is KODAS here behaving as a test case?

For the train2.py, which should be the stage-2 training, the first question is that we can not generate ./model/MMSE_model.pt after running stage-1 training, and your statement about the training process is inconsistent in readme.txt and /src/README.md which is the User Guide. If I follow the part in User Guide here:

Then use the checkpoint of the trained base model to 'warmstart' the GAN architecture. Training the generator and discriminator from scratch was found to result in unstable training, but YMMV.
# Train using full generator-discriminator loss
python3 train.py --model_type compression_gan --regime low --n_steps 1e6 --warmstart --ckpt path/to/base/checkpoint
I will get an error saying: Traceback (most recent call last): File "/content/gdrive/MyDrive/RGBcode/train.py", line 328, in model_type=args.model_type, current_args_d=dictify(args), strict=False, prediction=False) File "/content/gdrive/MyDrive/RGBcode/src/helpers/utils.py", line 189, in load_model loaded_args_d = checkpoint['args'] KeyError: 'args' Basically, I am thinking maybe something goes wrong at your checkpoint because it does not have the key 'args', but I am not sure. We hope you can help us with these issues! Thanks!
The training dataset was COCO2014 and testing dataset was KODAK. The setting of training process was almost the same as in HiFiC. We test on KODAK because the number of images is relatively small and it would take too much time calculating PI score if test dataset was really large.

After stage-1 training, there should be a folder "experiments" that is where the MMSE_model saved.

Thanks for clarifying training/testing sets! For stage-1 training, I found the folder "experiments" that saved the MMSE_model, but there are multiple files of each epoch here: Screenshot 2022-10-27 135001

And there is no MMSE_model.pt here. So what is the exact command for -warmstart --ckpt here? Do we need to retrain each epoch, or there is other things happening here? Thanks!

Claire-YC commented 1 year ago

Hi Zeyu! Thanks for the reply! Problem 1 solved! But I'm a little confused about response 2. Because, the paper you mentioned still doesn't use the DAL framework.

ZeyuYan commented 1 year ago

Hi Zeyu! This is Yujie Liu from University of Michigan, my group is trying to reproduce your paper for our Machine Learning class final project, and we encountered some problems in your RGB code:

Which, and how many photos did you use for the first training process in the train.py file? In the paper, I found that you trained Gd, Gp, Gh on COCO2014 which is a vast database, but in appendix D the image samples are from in KODAK dataset which is relatively small. Is KODAS here behaving as a test case?

For the train2.py, which should be the stage-2 training, the first question is that we can not generate ./model/MMSE_model.pt after running stage-1 training, and your statement about the training process is inconsistent in readme.txt and /src/README.md which is the User Guide. If I follow the part in User Guide here:

Then use the checkpoint of the trained base model to 'warmstart' the GAN architecture. Training the generator and discriminator from scratch was found to result in unstable training, but YMMV.
# Train using full generator-discriminator loss
python3 train.py --model_type compression_gan --regime low --n_steps 1e6 --warmstart --ckpt path/to/base/checkpoint
I will get an error saying: Traceback (most recent call last): File "/content/gdrive/MyDrive/RGBcode/train.py", line 328, in model_type=args.model_type, current_args_d=dictify(args), strict=False, prediction=False) File "/content/gdrive/MyDrive/RGBcode/src/helpers/utils.py", line 189, in load_model loaded_args_d = checkpoint['args'] KeyError: 'args' Basically, I am thinking maybe something goes wrong at your checkpoint because it does not have the key 'args', but I am not sure. We hope you can help us with these issues! Thanks!
The training dataset was COCO2014 and testing dataset was KODAK. The setting of training process was almost the same as in HiFiC. We test on KODAK because the number of images is relatively small and it would take too much time calculating PI score if test dataset was really large.

After stage-1 training, there should be a folder "experiments" that is where the MMSE_model saved.
Thanks for clarifying training/testing sets! For stage-1 training, I found the folder "experiments" that saved the MMSE_model, but there are multiple files of each epoch here:

And there is no MMSE_model.pt here. So what is the exact command for -warmstart --ckpt here? Do we need to retrain each epoch, or there is other things happening here? Thanks!

Training models are saved every epoch as checkpoints. You can just rename the latest model as MMSE_model.pt

ZeyuYan commented 1 year ago

Hi Zeyu! Thanks for the reply! Problem 1 solved! But I'm a little confused about response 2. Because, the paper you mentioned still doesn't use the DAL framework.

DAL framework here means encoder and decoder are trained jointly in an end-to-end way, with loss function L_mse+λ*L_adv, which is the method most previously learning based compression model applied.

Claire-YC commented 1 year ago

Thanks for the reply! Although I solved the problem for $\beta$ and got the same trend as the paper proposed, the updated code for MSE calculation is different from the MSE equation. Could you elaborate it a little bit? Thank you!

ZeyuYan / Controllable-Perceptual-Compression

Some problems encountered when reproducing #1