IDKiro / DehazeFormer

[IEEE TIP] Vision Transformers for Single Image Dehazing
MIT License
357 stars 34 forks source link

OTS dataset #9

Closed kiriiiia closed 2 years ago

kiriiiia commented 2 years ago

I find thare are some images appearing repeatedly in the training set and test set,so could you please provide data cleaning code for OTS If it's convenient for you?

IDKiro commented 2 years ago

If you mean that the training and test sets in the outdoor experimental setup contain the same samples, then yes, indeed.

This data leakage is mainly caused by the wrong train/test split of the RESIDE dataset (it provides two versions of OTS, both with data leakage), and we just followed the experimental setup of the previous works (although it is quite unreasonable).

If you want to follow our (or other peers') work, I suggest you do not change the experimental setup. If you insist on doing so, please eliminate the images with the same id by yourself. It is very simple.

kiriiiia commented 2 years ago

I‘ve known how to deal with it already,thank you for your reply!!

IDKiro commented 2 years ago
import os
import shutil

if __name__ == '__main__':
    # show the duplicate images in the train set of RESIDE-OUT

    for suffix in ['GT', 'hazy']:
        refer_dir = 'data/RESIDE-OUT/test/' + suffix
        target_dir = 'data/RESIDE-OUT/train/' + suffix
        backup_dir = 'data/RESIDE-OUT/backup/' + suffix

        os.makedirs(backup_dir, exist_ok=True)

        refer_img_names = os.listdir(refer_dir)
        target_img_names = sorted(os.listdir(target_dir))

        refer_img_ids = sorted(set([name.split('_')[0] for name in refer_img_names]))

        for target_img_name in target_img_names:
            if target_img_name.split('_')[0] in refer_img_ids:
                print(target_img_name)

                shutil.move(
                    os.path.join(target_dir, target_img_name),
                    os.path.join(backup_dir, target_img_name)
                )
IDKiro commented 2 years ago

Because it's really simple, I also wrote a code for reference.