HighwayWu / LASTED

Synthetic Image Detection
MIT License
51 stars 2 forks source link

Train data and Openset issues #2

Closed radi-cho closed 1 year ago

radi-cho commented 1 year ago

Hello, The practical test set seems to have only paintings and not real/fake photos, so I was wondering if it is a better idea to benchmark the model on a combination of the two. Then, I referred to the provided Openset and was confused why all the names are prepended by real_ - aren't all the images there generated by GANs/DMs? What real_ stands for in the image names?

Finally, to make it possible for other studies to compare to yours, can you provide the full training set (or scripts to generate it from the datasets mentioned in the paper so it doesn't have overlaps with the provided test sets)? I wanted to reproduce your results by training LASTED on my end and then updating the way it works to see if improvements can be made. In that regard, providing any random seeds, etc. used during benchmarking would be appreciated as well, so the same metric values could be optained.

HighwayWu commented 1 year ago

Thanks for your attention.

1) Since the Openset testset has included the combination of real/fake photo/paintings, we only collect some real/fake paintings in the Practical testset.

2) The name of the folders in the Openset test means the data sources (e.g., ImageNet_VISIONBigGAN means the real data are from ImageNet and VISION, while the fake data are from (generated by) the BigGAN). The “real” just for a more convenient metric calculation, as we only need to classify/cluster the data into “real” and “fake” classes.

3) Sure. We will release all the training data soon.

HighwayWu commented 1 year ago

The training dataset can be downloaded from the following link. Due to Google Drive's capacity limitations, we are currently only able to share it via Baidu Pan: https://pan.baidu.com/s/1ZgSiNX_Dd7cZcwiHv1ujVg?pwd=h4p7

radi-cho commented 1 year ago

Thank you.