Question about label shuffle

Hi, thanks for your great work!

I have a question about label shuffle augmentation. According to the paper, it says each instance is processed independently if a video contains multiple objects, and then the results are aggregated using a softmax aggregation. My question is, why can label shuffle boost performance even though each instance is independently processed.

carrierlxk / GraphMemVOS

Question about label shuffle #10