Do you have a paper/technical report to refer to more implementation details?

HighwayWu / LASTED

Synthetic Image Detection

MIT License

52 stars 2 forks source link

It seems like you are using CLIP with 4 possible textual description and then use cosine similarity for classification, just like CLIP. However, unlike CLIP where the cardinality of the labels, i.e., number of possible text sentences is practically unlimited (in training at least), whereas in LASTED it is only 4. I wonder how much of an uplift is there if we are to train on the same CLIP image encoder, LASTED vs something like just adding a regressor head on top of CLIP image encoder using standard multi-class categorical class entropy loss.

HighwayWu / LASTED

Do you have a paper/technical report to refer to more implementation details? #14