Closed zzh-tech closed 2 years ago
Thanks for your attention!
As said in the readme file, 'We only provide pre-trained and fine-tuned models for deraining since RAIN100 datasets only contain hundreds of low-resolution images (insufficient to train transformers)'. For example, there are only 200 low-resolution image pairs in the RAIN100L training set, which is difficult to train an EDT-B model (~11M params).
We found the large batch does help, slightly inferior to the multi-task setting. But multi-related pre-training can provide initialization for multiple tasks, thus being more computationally efficient.
Thanks for quick response!
But multi-related pre-training can provide initialization for multiple tasks, thus being more computationally efficient.
Are there any experiments to support this? For example, finetuning the pretrained single-x2-96 model and multi-x2x3x4-32 model on x3/x4 tasks and then comparing the performance.
There is a misunderstanding. If using the x2x3x4-32 pre-training setting, you could finetune x2, x3, x4 models, respectively. By contrast, the pre-trained single-x2-96 model cannot provide a strict initialization for x3 and x4 tasks due to the disparity in network structure.
Thanks, I got it. But the transformer stages are the same, right? So, I think it's still worth a try (just my personal opinion).
Another question: As for table 2, I'm wondering that did you conduct the experiment when the training iterations of EDT-B+ (sing-task pretraining 200k data) is three times of EDT-B* (multi-task pretraining 200k data)? (I think it's another way to do fair comparison instead of increasing batch size of sing-task model)
Thanks for the nice suggestions. For the transformer body, yes, they are the same. It is interesting to see whether the pre-trained x2 transformer body is a good initialization for x3 and x4 tasks. As for extending the training iterations of EDT-B+, I think it is also not a fair comparison as more optimization steps are introduced. Of course, you may take a try.
Okay, thanks and good luck with ECCV!
Lol! Thanks!
Thanks for your great work! When reading the paper, I had the following questions:
Where can I find the deraining results without pretraining?
From the following table, it can be seen that the multi-task pretraining strategy has no advantage when the total batch size of single-task pretraining is the same as the multi-task pre-training strategy?
Looking forward to hearing from you!