HighwayWu / LASTED

Synthetic Image Detection
MIT License
51 stars 2 forks source link

Do you have a paper/technical report to refer to more implementation details? #14

Open xiankgx opened 6 months ago

xiankgx commented 6 months ago

It seems like you are using CLIP with 4 possible textual description and then use cosine similarity for classification, just like CLIP. However, unlike CLIP where the cardinality of the labels, i.e., number of possible text sentences is practically unlimited (in training at least), whereas in LASTED it is only 4. I wonder how much of an uplift is there if we are to train on the same CLIP image encoder, LASTED vs something like just adding a regressor head on top of CLIP image encoder using standard multi-class categorical class entropy loss.

xiankgx commented 6 months ago

I also wonder what happens if you augment the labels during training. For e.g., an AI image could be randomly selected from say:

Perhaps something would make use of the text-modality a little more to boost performance?