MarcoForte / FBA_Matting

Official repository for the paper F, B, Alpha Matting
MIT License
464 stars 95 forks source link

Questions about training and test details #6

Closed xymsh closed 4 years ago

xymsh commented 4 years ago

Hi, thank you for sharing your code and paper!

Recently, I’m reproducing your work. I have some questions about the training and test details.

  1. In the paper, it is said that RAdam optimizer is used with momentum 0.9 and weight decay 0.0001. But I didn’t found “momentum” parameter in the official RAdam optimzier code. Did you modify the official code or just set beta1 to 0.9?

  2. About the weight decay, there are two descriptions in the paper. a) weight decay 0.0001 in RAdam optimizer, b) weight decay of 0.005, 1e-5 to convolutional weights and GB parameters. How did you set them in your code? I tried to set the weight decay in the optimizer to 0.0001, and add L2 loss to conv weights and GN weights & bias with weight 0.005 and 1e-5. Because I think L2 loss here is equivalent to weight decay. Is it same as yours?

  3. The input resolution for training patches. The training patches of size 640, 480, 320, are randomly cropped during training. After that, did your resize them to a certain size? If not, how to train the model with batch size 6?

  4. The input channel for test. In section 3.6, “During inference, the full-resolution input images and trimaps are concatenated as 4-channel input and fed into the network.” But in the code, 11-channel input is used. Is it a typo?

  5. About the re-estimated foreground. I tried to re-estimate the training foreground images, but only succeeded in 411 of 431 fg. 20 of them failed to be optimized. Did you have the same problem? Besides, when calculating the alpha * foreground error during test, are the ground-truth foreground images re-estimated by closed form?

Thank you !

MarcoForte commented 4 years ago

Hi thanks for your interest in my project. Let me help clarifying these points.

  1. We use default betas from this RAdam implementation.https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py

  2. Here's the code snippet for setting the weight decay and learning rates
    https://gist.github.com/MarcoForte/e1518c1d927057ea20ada841ac63642c and applied with this line RAdam(group_weight(model, 1e-5, 1e-5, 0.0005)) The reference to 0.0001 is a typo, thank you for spotting this. I'll correct in camera-ready version.

  3. In batch size 6 mode I only use crops of 320x320

  4. We also included the two channel binary trimap which makes the total channel number 3(rgb)+6(gaussian)+2(binary). I'll correct this.

  5. When calculating the alpha*F error in the ground truth I use the foreground they provide.

xymsh commented 4 years ago

Thanks!

For the first one, all the default parameters are used except learning rate in the RAdam code you provided, right?

MarcoForte commented 4 years ago

Yes exactly.

facetohard commented 4 years ago

Thanks for providing more details. One more question about transformed trimap. Did you calculate it before crop or re-calculate after crop?

zoezhou1999 commented 4 years ago

Hi, @xymsh. Did you complete the reimplementation? I am also reimplementing this. Could we discuss something related via messages? Thank you so much!

liangyufz commented 3 years ago

hello the link https://gist.github.com/MarcoForte/e1518c1d927057ea20ada841ac63642c can not access now , could you plsease update it or commit it ?

Hi thanks for your interest in my project. Let me help clarifying these points.

  1. We use default betas from this RAdam implementation.https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py
  2. Here's the code snippet for setting the weight decay and learning rates https://gist.github.com/MarcoForte/e1518c1d927057ea20ada841ac63642c and applied with this line RAdam(group_weight(model, 1e-5, 1e-5, 0.0005)) The reference to 0.0001 is a typo, thank you for spotting this. I'll correct in camera-ready version.
  3. In batch size 6 mode I only use crops of 320x320
  4. We also included the two channel binary trimap which makes the total channel number 3(rgb)+6(gaussian)+2(binary). I'll correct this.
  5. When calculating the alpha*F error in the ground truth I use the foreground they provide.