Some labels are duplicate

ant-research / VCSL

Video Copy Segment Localization (VCSL) dataset and benchmark [CVPR2022]

MIT License

119 stars 17 forks source link

Thanks for your great work on VCSL dataset! When checking test pair in split_meta_pairs.json, we found about 1% of the labels are self-duplicate (e.g. e45b9eec5ea54e00a2d6e6689cd5fe92-e45b9eec5ea54e00a2d6e6689cd5fe92, dd44c4be9fdc4c95bdf075e7e756294e-dd44c4be9fdc4c95bdf075e7e756294e) or query-reference reverse duplicate (e.g. dd44c4be9fdc4c95bdf075e7e756294e-0a25c04af29940e5b34676b6c1a7eca1(label:[[65, 1, 79, 156]]) and 0a25c04af29940e5b34676b6c1a7eca1-dd44c4be9fdc4c95bdf075e7e756294e(label:[[1, 65, 156, 79]])). And labels of some query-reference reverse duplicate pairs are different (e.g. dd44c4be9fdc4c95bdf075e7e756294e-ca334995c55c45af93dee776799f0433(label:[[52, 9, 82, 44], [68, 84, 82, 96], [9, 1, 16, 7], [46, 97, 49, 100], [47, 45, 49, 47]]) and ca334995c55c45af93dee776799f0433-dd44c4be9fdc4c95bdf075e7e756294e (label:[[9, 52, 44, 82], [1, 9, 7, 16], [97, 46, 100, 49]])). These duplicate pairs may cause inaccurate results. I would like to ask that are these duplicate pairs made by mistake or by designed, thank you!

Thank you for the feedback! For the "self-duplicate" pairs, we consider them as special and valid cases. The models are expected to consider one video as "copying" itself. Our annotation process has several stages with interactions between human annotators and algorithms, and occasionally those "query-reference reverse duplicates" could be included (about 1% of the total pairs). Most of these pairs have equivalent labels as your second example, but when the duplicates were processed by different annotator, the labels could be inconsistent (about 0.3% of the total pairs). We remove the "query-reference reverse duplicates" and tested the TransVCL model, the F1-score is 66.41. The difference with the number in the paper (66.51) is pretty small. To clear up concerns, we will remove those duplicates and update the benchmark. Thanks again!

ant-research / VCSL

Some labels are duplicate #13