Thank you for sharing your work. I am looking at the code for finetuning the model and trying to understand how to apply mixup and cutmix to videos. The train dataloader seems to provide a batch of size (B, C, T, H, W). However, the mixup function from timm requires a batch of size (B, C, H, W). I couldn't find the code for reshaping the batch before sending it to mixup function. Am I missing something? Should we reshape the batch from (B, C, T, H, W) to (B, C*T, H, W) or (B*T, C, H, W)? Which is the correct way?
Hello,
Thank you for sharing your work. I am looking at the code for finetuning the model and trying to understand how to apply mixup and cutmix to videos. The train dataloader seems to provide a batch of size (B, C, T, H, W). However, the mixup function from timm requires a batch of size (B, C, H, W). I couldn't find the code for reshaping the batch before sending it to mixup function. Am I missing something? Should we reshape the batch from (B, C, T, H, W) to (B, C*T, H, W) or (B*T, C, H, W)? Which is the correct way?
Thank you.