HighwayWu / LASTED

Synthetic Image Detection
MIT License
51 stars 2 forks source link

Question about dataset used in the paper #5

Closed Woodyet closed 11 months ago

Woodyet commented 11 months ago

Hello,

Great paper great results thanks for your contribution to the field.

I was wondering if you could answer a question I had to do with the datasets used for training. In the paper you say.

"We form the training dataset by including four categories of data, namely, real photos from LSUN [79], real paintings from Danbooru [3], synthetic photos by ProGAN [49], and synthetic paintings by Stable Diffusion (SD) [9, 11] from [6].

The image synthesis models ProGAN and SD here are deliberately trained on LSUN and Danbooru, respectively, forcing the detector to learn more discriminative representations from visually similar real and synthetic images."

I wonder why you didn't choose to allow both ProGAN and SD to be trained on LSUN and Danbooru together? Surely by training both methods on both domains you would have more coverage of the "fake" space and hence better generalisation.

Additionally I just want to confirm that this work required you to train a ProGAN model on LSUN data and a Stable diffusion model on Danbooru data? Or did you obtain these models from previous works, if so where did you obtain these?

Thanks!

HighwayWu commented 11 months ago

Hi, thanks for your interest.

I also agree that both ProGAN and SD training together on LSUN and Danbooru would produce better synthesized results (on both photo and painting). Considering time cost and the focus is on training a general extractor, however, we directly selected the existing models/datasets from [1][2]. Looking forward to a broader benchmark in the future.

[1] ProGAN & LSUN: "CNN-generated images are surprisingly easy to spot... for now" in CVPR'20. [2] SD trained on Danbooru: https://lexica.art/

Woodyet commented 11 months ago

Thank you immensely, hope to be adding to the repo soon :)