ZhexinLiang / CLIP-LIT

[ICCV 2023, Oral] Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
https://zhexinliang.github.io/CLIP_LIT_page/
269 stars 23 forks source link

Number of training iterations & choice of the best model #10

Open Atmyre opened 10 months ago

Atmyre commented 10 months ago

Hello,

I have the following questions:

  1. In the paper you said that the total number of training iterations is set to 50k. In default parameters of training in the code num_epochs is set to 3000. How does the total number of iterations relate to num_epochs?
  2. How did you choose the final model? When I run training it seems that results produced on each round differ really much. Did you choose the last model or the best by some metric?

I would really appreciate your answers

ZhexinLiang commented 10 months ago

Hi, thanks for your interest in our paper.

  1. The num_epochs is set to 2000 in the default code (train.py) and higher in the readme.md as 3000 to ensure the model converges if the user wants to train from scratch. And if you set batch size as 16, for 368 images, 50K iterations is about 2174 epochs. If you set batch size as 8, for 368 images, 50K iterations is about 1087 epochs. You can change it based on the batch size you set. And for training from the initial model I provided, you can only set num_epochs to 1000 or even less.
  2. I tried a few models and chose the model with the best metrics (e.g. the loss) (In my experiments, the last is the best most of the time). You can also choose the model in terms of results that looks best in folder "./train0/result_train0/" by default.

Sorry for the late response. I am working on my next project. Feel free to ask in this issue if you have more questions.

Atmyre commented 8 months ago

Hello again,

May I also ask why you used DIV2K_384 images as gt for training, not BAID_380/resize_gt ?

ZhexinLiang commented 8 months ago

Hi @Atmyre,

Our method operates an unsupervised manner, which means it does not require any paired data and offers a better generalization ability rather than overfitting to the specific dataset.

Shuffling the BAID_380/resize_gt as GT can also achieve "unpaired" data. However, since the BAID_380/resize_gt are manually retouched and may not represent the distribution of real images accurately, we choose DIV2K dataset instead. You can also try to use other well-lit image dataset as GT if needed.

If you still choose to use BAID_380/resize_gt as GT, it might improve performance on the BAID test dataset but could worsen performance on the Backlit300 test dataset compared to our current checkpoint, as the model might learn to overly adapt to the BAID dataset.

To emphasize, one of our motivations is to propose a framework for training models in situations where obtaining real ground truth data is not feasible.

Feel free to discuss with me if you have any other questions.

Atmyre commented 8 months ago

Thanks for your help!