In docmsu_all.json, does the 'is_sar' key denote sample(news, image) whether or not sarcasm?
I load json file, finding out that there's only about 7000 samples with is_sar = 1, while the paper data statistic shows there are about 30k sample that is sarcasm. I wonder where is going wrong.Thanks.
Yes, is_sar=1 denotes the img-text pair is sarcastic. As mentioned in paper, we use GPT to augment text data. The current version (7k) is the fully manually labeled version before augmentation.
In docmsu_all.json, does the 'is_sar' key denote sample(news, image) whether or not sarcasm? I load json file, finding out that there's only about 7000 samples with is_sar = 1, while the paper data statistic shows there are about 30k sample that is sarcasm. I wonder where is going wrong.Thanks.