fenglinglwb / EDT

On Efficient Transformer-Based Image Pre-training for Low-Level Vision
128 stars 9 forks source link

Questions about experimental results #3

Closed zzh-tech closed 2 years ago

zzh-tech commented 2 years ago

Thanks for your great work! When reading the paper, I had the following questions:

  1. Where can I find the deraining results without pretraining?

  2. From the following table, it can be seen that the multi-task pretraining strategy has no advantage when the total batch size of single-task pretraining is the same as the multi-task pre-training strategy? image

Looking forward to hearing from you!

fenglinglwb commented 2 years ago

Thanks for your attention!

  1. As said in the readme file, 'We only provide pre-trained and fine-tuned models for deraining since RAIN100 datasets only contain hundreds of low-resolution images (insufficient to train transformers)'. For example, there are only 200 low-resolution image pairs in the RAIN100L training set, which is difficult to train an EDT-B model (~11M params).

  2. We found the large batch does help, slightly inferior to the multi-task setting. But multi-related pre-training can provide initialization for multiple tasks, thus being more computationally efficient.

zzh-tech commented 2 years ago

Thanks for quick response!

But multi-related pre-training can provide initialization for multiple tasks, thus being more computationally efficient.

Are there any experiments to support this? For example, finetuning the pretrained single-x2-96 model and multi-x2x3x4-32 model on x3/x4 tasks and then comparing the performance.

fenglinglwb commented 2 years ago

There is a misunderstanding. If using the x2x3x4-32 pre-training setting, you could finetune x2, x3, x4 models, respectively. By contrast, the pre-trained single-x2-96 model cannot provide a strict initialization for x3 and x4 tasks due to the disparity in network structure.

zzh-tech commented 2 years ago

Thanks, I got it. But the transformer stages are the same, right? So, I think it's still worth a try (just my personal opinion).

Another question: As for table 2, I'm wondering that did you conduct the experiment when the training iterations of EDT-B+ (sing-task pretraining 200k data) is three times of EDT-B* (multi-task pretraining 200k data)? (I think it's another way to do fair comparison instead of increasing batch size of sing-task model)

image

fenglinglwb commented 2 years ago

Thanks for the nice suggestions. For the transformer body, yes, they are the same. It is interesting to see whether the pre-trained x2 transformer body is a good initialization for x3 and x4 tasks. As for extending the training iterations of EDT-B+, I think it is also not a fair comparison as more optimization steps are introduced. Of course, you may take a try.

zzh-tech commented 2 years ago

Okay, thanks and good luck with ECCV!

fenglinglwb commented 2 years ago

Lol! Thanks!