Closed RongchangLi closed 1 year ago
@RongchangLi Thanks for pointing out this issue. Yes, you're right, this is a mistake when implementing the module. The size of kernels for self.f1_conv3d, self.f2_conv3d, and self.f3_conv2d can all be set to 3. The code lines as pointed are updated.
After making this change, the results on HMDB-51 show around 0.7% and 0.2% drop in terms of Open maF1 and Open Set AUC when compared to the results reported in the main paper. Open maF1 | Open Set AUC | Closed Set Acc |
---|---|---|
76.52 (0.17) | 76.81 | 94.03 |
@RongchangLi Thanks for pointing out this issue.
But 3D operations seem essential for debias operations. When the three branch are the same, it doesn't seem to have the ability to remove static bias shown in the paper. I am confused why it still works and the performance drops after debugging?
If the three branches are using Conv2D, the loss functions will remove the appearance bias. For example, you may say that in an implicit way, the middle branch learns the foreground appearance feature while the rest two branches learn the background appearance feature.
For the performance drop, it could be explained that Conv3D features are not optimally learned since I did not change any hyperparameters that are tuned based on the previous implementation.
If the three branches are using Conv2D, the loss functions will remove the appearance bias. For example, you may say that in an implicit way, the middle branch learns the foreground appearance feature while the rest two branches learn the background appearance feature.
For the performance drop, it could be explained that Conv3D features are not optimally learned since I did not change any hyperparameters that are tuned based on the previous implementation.
This article is insightful.So I want to discuss some issues in more detail. Is it too subjective to think that implicit elimination of appearance bias has occurred? When all three branches are 2D, the inputs of the three branches are high-order features extracted from the same spatial-temporal network, and the structure of the three branches is the same and simple (consisting of only one convolution, pooling, and fully-connected layer). The only difference is that the loss function makes the output of one branch different from the outputs of the other two branches. Why can we say that this process eliminates appearance bias?
If the three branches are using Conv2D, the loss functions will remove the appearance bias. For example, you may say that in an implicit way, the middle branch learns the foreground appearance feature while the rest two branches learn the background appearance feature. For the performance drop, it could be explained that Conv3D features are not optimally learned since I did not change any hyperparameters that are tuned based on the previous implementation.
This article is insightful.So I want to discuss some issues in more detail. Is it too subjective to think that implicit elimination of appearance bias has occurred? When all three branches are 2D, the inputs of the three branches are high-order features extracted from the same spatial-temporal network, and the structure of the three branches is the same and simple (consisting of only one convolution, pooling, and fully-connected layer). The only difference is that the loss function makes the output of one branch different from the outputs of the other two branches. Why can we say that this process eliminates appearance bias?
I see your points. In this case, the middle branch and the others will learn to give two sets of features, which are independent of each other but both discriminative for classification. Without explicit inductive bias (like the Conv2D/shuffle vs Conv3D) in the design, we indeed cannot claim the debiasing effect.
Thank you very much for your patiently explaining. It helps a lot.
It seems that the debiashead is used to implement CED. But it seems no 3D operations though some modules (self.f1_conv3d, self.f2_conv3d) are named with '3D'. Because the temporal size of conlolution kenerls is 1.
In this way , shuffling the feat actually won't make any sense. Actually it seems no difference between this three branch: 1.(f1_conv3d-->avg_pool-->fc1), 2.(temporal shuffling-->f2_conv3d-->avg_pool-->fc2) 3.(reshape-->f3_conv2d-->avg_pool-->fc3).
Here is the code in