UCSB-NLP-Chang / CoPaint

Implementation of paper 'Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models'
60 stars 6 forks source link

Issues using download script trying to guarantee the same splits for CelebA-HQ #8

Open djburnett opened 2 months ago

djburnett commented 2 months ago

While attempting to use scripts/download.sh to reproduce your dataset split on the CelebA-HQ 256x256 dataset, I had two issues in determining the splits used.

The first issue is the file temp_train_shuffled.flist was created using the shuf command, which produces random results each time it is run. As a result, lama-celeba/train_shuffled.flist and lama-celeba/val_shuffled.flist as generated when I ran the script cannot be guaranteed to be consistent with those used by the paper. Is it possible you can upload your generated versions of those two files, or your generated temp_train_shuffled.flist? This will allow me to be certain the dataset split is consistent in my experiments.

The second issue is the google drive link to download the dataset no longer works (the one that downloads data256x256.zip). While searching for the dataset on sources such as Kaggle, I noticed the images in the dataset often get reordered in different uploads. I am currently using a 256 version of CelebAMask-HQ, which indexes images consistently with the order of CelebA-HQ on TensorFlow.org (and the full size CelebAMask-HQ). Can you verify whether the ordering of the images seems to be consistent with that, or otherwise provide a valid URL for data256x256.zip from CelebA-HQ?


The reason I am concerned is from my understanding you made use of the training split to train the diffusion model, so if I wish to reuse that model I want to be certain I am not accidentally validating or testing my work using data from the diffusion model's training dataset.