fesvhtr / DocMSU

[AAAI 2024] Official repository of the paper "DocMSU: A Comprehensive Benchmark for Document-level Multimodal Sarcasm Understanding"
6 stars 0 forks source link

Sarcasm sample propotion #1

Open Shelly-zzz opened 3 months ago

Shelly-zzz commented 3 months ago

In docmsu_all.json, does the 'is_sar' key denote sample(news, image) whether or not sarcasm? I load json file, finding out that there's only about 7000 samples with is_sar = 1, while the paper data statistic shows there are about 30k sample that is sarcasm. I wonder where is going wrong.Thanks.

fesvhtr commented 3 months ago

Yes, is_sar=1 denotes the img-text pair is sarcastic. As mentioned in paper, we use GPT to augment text data. The current version (7k) is the fully manually labeled version before augmentation.