Algolzw / daclip-uir

[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.
https://algolzw.github.io/daclip-uir
MIT License
638 stars 30 forks source link

When will the training code release? I can not wait to try it! #1

Closed FrontierBreaker closed 12 months ago

FrontierBreaker commented 12 months ago

Awesome work!

Algolzw commented 12 months ago

Thanks!

The pretrained weight of DA-CLIP is here and you can use it to generate degradation embeddings and clean image embeddings as the example code.

In addition, I'm planning to release the training code for DA-CLIP later this month.

FrontierBreaker commented 12 months ago

what's the gpu memory requirement for the training? (I only have a 12GB GPU but I would like to try it!

FrontierBreaker commented 12 months ago

I think this paper would be impactful for the community. : )

Algolzw commented 12 months ago

Thanks again for your interest! I train the DA-CLIP using 4 A100 GPUs (large batch size due to the contrastive learning) and an A100 for the downstream image restoration training. The training details can be found in the paper (Appendix B.1).

Only a 12 GB GPU might be hard to train this model (but of course it is possible to change the batch size, patch size, and model parameters to fit your computer). Moreover, I will provide pretrained weights for the downstream diffusion model so you can easily test it.

FrontierBreaker commented 12 months ago

Got that. Thanks for your timely reply and suggestions! By the way, how about the training time on 4*A100 and the per-GPU memory cost for your experiments?

Algolzw commented 12 months ago

The training takes 3 hours and almost all GPU memories. :)

But the batch size is 784 x 4. You can try a smaller batch size such as 64 which would also work well.

FrontierBreaker commented 12 months ago

Got that. Thank you!