Algolzw / daclip-uir

[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.
https://algolzw.github.io/daclip-uir
MIT License
638 stars 30 forks source link

Seems to be working well only on test images #6

Closed denisbondare closed 11 months ago

denisbondare commented 11 months ago

I'm using Google Colab with gradio to test the project, and it seems it doesn't do almost anything on my custom images (I tried different degradations in my examples), it only adds some noise and that's it. It works very well only for the test images provided in the project. Is there possibly a mistake in app.py script or anywhere else in the pipeline for testing with custom images? Or maybe a pretrained model is so limited?

Algolzw commented 11 months ago

Hello, what's the image/degradation you want to test with our model? Like other deep learning based approaches our model might be limited by the training datasets which are specially collected from different image restoration tasks (as shown in our paper appendix A). It works well on some similar captured images but sometimes would also produce unsatisfied results due to the data distribution shifts.

denisbondare commented 11 months ago

I'm testing the same kinds of degradations as presented in the paper. Manually uncompleted, motion blurry, hazy, rainy. So if everything is right in the code and the model is actually pretrained to work in general, it should be fine, at least some examples should show similar quality of the results. So far I couldn't make it work on any images, not presented in the project itself. Does practically nothing except for some extra noise and unexpected behavior, like removing a random reflection on the picture. The problem might be in recognition of the degradation pattern, is there a possibility to choose it manually instead of using automatic recognition?

Algolzw commented 11 months ago

Okok, can you share some examples then I can test them on my server?

denisbondare commented 11 months ago

Here's a screenshot of my notebook, images I use and results I get. It seems the last one get some noise reduction treatment, but what confuses me the most is uncompleted examples. It does it so easily on the example images, but I tried it on several I made myself, and it seems it doesn't recognize it's an incomplete image. Screenshot_3 test1 result1 test2 result2 test3 result3

Algolzw commented 11 months ago

Aha, I think the results on uncompleted images aren’t good because we only trained our model with face inpainting dataset. Maybe I can add more general images in training to improve the generalization ability. :)

denisbondare commented 11 months ago

I tried it on a face example, nothing like paper results.. If it's actually only a training problem than it definitely needs more training on bigger data, because for now the results in paper look very impressive, but seems that the pretrained model isn't ready for actual usage. Or at least some commentary in the manual about quite a restriction of a currently available pretrained model would be very useful to avoid confusion, otherwise seems kind of deceiving :( Thank you for your work, the results in paper are actually very good! Screenshot_4

téléchargement (1)

Algolzw commented 11 months ago

Yes I believe it’s a training problem since we only collected limited images for training and testing from other task-specific papers (as listed in the table). Most images in the same dataset are obtained by the same device or synthetic approach, which makes it currently hard to be used for real applications. (Unlike stable diffusion which uses billions of images from different websites to train their model.)

2C352D2C-FABE-43D4-8AEC-4750EBD8E2A8

However, the universal image restoration is still a under-explored problem and definitely deserves more exploration. I am happy to add your comments to my future work (more datasets, image sources, and degradations) and release the stronger version model.

Thanks!