Algolzw / daclip-uir

[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.
https://algolzw.github.io/daclip-uir
MIT License
582 stars 30 forks source link

it works NOT with any NOVEL DATA!!!!! it ONLY works with EXAMPLES #8

Closed fritol closed 7 months ago

fritol commented 8 months ago

did you publish a wrong checkpoint? or this is a bogus project ?

Algolzw commented 8 months ago

Hi! Our current model does not support real-world images/applications well, as you have tried 🙁. It might caused by that our training dataset is collected from other task-specific image restoration projects (you can find them in our paper, Appendix A) which have relatively smaller image pairs and thus lack of good generalization ability for your custom images.

Actually, we had mentioned this problem in our document notice section and also provided more examples from our test dataset to illustrate the method.

If you want the model to be more powerful and reliable for real-world common pictures, one possible solution is to collect more images from different sources and adopt the training strategy of Real-ESRGAN or DiffBIR.

Again, this project is mainly written for academic purposes. The Universal Image Restoration is a new concept and worth more exploration and attention. We are really happy to know that many people like this idea and want to try their own images. Thank you! We will keep improving our model and make it more practical in our future work!

Denys88 commented 8 months ago

So it is just overfitting on training data? I tried paint some white lines on a random image and it didn't work. Also didnt help with blurring.

Algolzw commented 8 months ago

It should work on face inpainting since we only use CelebaHQ-256 in training. Moreover, our model also works well on testing images and we are not aiming to overfit the training data. For blurring, we use GoPro which is a synthetic dataset in which each image is generated by averaging several neighboring frames to simulate the motion blurry.

As required, I will try to collect more image pairs from different datasets to make our model capable of processing more real degradation types.

BTW, we also found that directly resizing input images will lead a poor performance for most tasks. We will also try to add the resize step into the training to add the generalization ability.

Denys88 commented 8 months ago

That what I tried: I manually scaled images to the 256x256 but any scaling didn't help at all. For the blur I might have an explanation. There are different sources of blur, like in your example is related to the mvoement. But I tested on the the photo with wrong lens parameters. Could you try to add some noise and or white lines on random image from the internet and check it too?

Algolzw commented 8 months ago

@Denys88 Yes, for face inpainting you can choose images with size 256x256. I manually downloaded an image and added the mask to it, the result is below:

Screenshot 2023-10-12 at 14 57 19

The mask-adding code is:

import cv2
import numpy as np
import os

def add_random_mask(img, size=None, mask_root='inpainting_masks', mask_id=-1, n=100):
    if mask_id < 0:
        mask_id = np.random.randint(n)

    mask = cv2.imread(os.path.join(mask_root, f'{mask_id:06d}.png')) / 255.
    if size is None:
        mask = cv2.resize(mask, (img.shape[0], img.shape[1]), interpolation=cv2.INTER_AREA)
    else:
        mask = cv2.resize(mask, (size[1], size[0]), interpolation=cv2.INTER_AREA)
        rnd_h = np.random.randint(0, max(0, size[0] - img.shape[0]))
        rnd_w = np.random.randint(0, max(0, size[1] - img.shape[1]))
        mask = mask[rnd_h : rnd_h + img.shape[0], rnd_w : rnd_w + img.shape[1]]

    return mask * img + (1. - mask)

im_name = 'Elon-Musk-256x256.jpg'
im = cv2.imread(f'images/{im_name}') / 255.
masked_im = add_random_mask(im) * 255
cv2.imwrite(f'LQ_images/{im_name}', masked_im)

For the blurry images, I don't know how to choose them from the internet. Maybe I should retrain a model with a real motion blur dataset (but the paired motion blur datasets are really really difficult to find or make, that's why we always use synthetic images).

AlexanderKozhevin commented 8 months ago

I tried on replicate. It did nothing with the image 🤔

Screenshot 2023-10-14 at 14 12 07
Algolzw commented 8 months ago

Hi, our model for inpainting is trained on the Celeba-HQ face dataset (with only one aligned face in an image). You can find more information here: https://www.kaggle.com/datasets/badasstechie/celebahq-resized-256x256 Here is an example:

Screenshot 2023-10-14 at 13 55 46
LukaGiorgadze commented 5 months ago

@Algolzw Any future plan to provide pre-trained models for real-life use cases?

Algolzw commented 5 months ago

@LukaGiorgadze I will provide a slightly better weight (for resized images) later this month.