Closed xymsh closed 4 years ago
Hi thanks for your interest in my project. Let me help clarifying these points.
We use default betas from this RAdam implementation.https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py
Here's the code snippet for setting the weight decay and learning rates
https://gist.github.com/MarcoForte/e1518c1d927057ea20ada841ac63642c
and applied with this line RAdam(group_weight(model, 1e-5, 1e-5, 0.0005))
The reference to 0.0001 is a typo, thank you for spotting this. I'll correct in camera-ready version.
In batch size 6 mode I only use crops of 320x320
We also included the two channel binary trimap which makes the total channel number 3(rgb)+6(gaussian)+2(binary). I'll correct this.
When calculating the alpha*F error in the ground truth I use the foreground they provide.
Thanks!
For the first one, all the default parameters are used except learning rate in the RAdam code you provided, right?
Yes exactly.
Thanks for providing more details. One more question about transformed trimap. Did you calculate it before crop or re-calculate after crop?
Hi, @xymsh. Did you complete the reimplementation? I am also reimplementing this. Could we discuss something related via messages? Thank you so much!
hello the link https://gist.github.com/MarcoForte/e1518c1d927057ea20ada841ac63642c can not access now , could you plsease update it or commit it ?
Hi thanks for your interest in my project. Let me help clarifying these points.
- We use default betas from this RAdam implementation.https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py
- Here's the code snippet for setting the weight decay and learning rates https://gist.github.com/MarcoForte/e1518c1d927057ea20ada841ac63642c and applied with this line
RAdam(group_weight(model, 1e-5, 1e-5, 0.0005))
The reference to 0.0001 is a typo, thank you for spotting this. I'll correct in camera-ready version.- In batch size 6 mode I only use crops of 320x320
- We also included the two channel binary trimap which makes the total channel number 3(rgb)+6(gaussian)+2(binary). I'll correct this.
- When calculating the alpha*F error in the ground truth I use the foreground they provide.
Hi, thank you for sharing your code and paper!
Recently, I’m reproducing your work. I have some questions about the training and test details.
In the paper, it is said that RAdam optimizer is used with momentum 0.9 and weight decay 0.0001. But I didn’t found “momentum” parameter in the official RAdam optimzier code. Did you modify the official code or just set beta1 to 0.9?
About the weight decay, there are two descriptions in the paper. a) weight decay 0.0001 in RAdam optimizer, b) weight decay of 0.005, 1e-5 to convolutional weights and GB parameters. How did you set them in your code? I tried to set the weight decay in the optimizer to 0.0001, and add L2 loss to conv weights and GN weights & bias with weight 0.005 and 1e-5. Because I think L2 loss here is equivalent to weight decay. Is it same as yours?
The input resolution for training patches. The training patches of size 640, 480, 320, are randomly cropped during training. After that, did your resize them to a certain size? If not, how to train the model with batch size 6?
The input channel for test. In section 3.6, “During inference, the full-resolution input images and trimaps are concatenated as 4-channel input and fed into the network.” But in the code, 11-channel input is used. Is it a typo?
About the re-estimated foreground. I tried to re-estimate the training foreground images, but only succeeded in 411 of 431 fg. 20 of them failed to be optimized. Did you have the same problem? Besides, when calculating the alpha * foreground error during test, are the ground-truth foreground images re-estimated by closed form?
Thank you !