Closed CeZh closed 4 months ago
The input instance feature actually has no physical meaning, and it will not be retained in the computation graph during tracing. The computation graph retains only the key points from the first layer, so we consider the gradient of this instance feature to be unimportant.
Disabling the gradient of fix_scale
from our original design intention to fix key points, ensuring that these points can always sample features.
Thanks a lot!
Hello Xuewu,
As for the instance_features, I understand your explanation and verified that I can reproduce your results when it set the require_grad
to false
. However, when I take a close look at the model, isn't that cause the first deformable layer's (layer_0 in your definition) self.learnable_fc
inside the SparseBox3DKeyPointsGenerator
not learning? It seems the self.learnable_fc
's gradient norm at the first deformable layer should always be 0, right? May I know is that on purpose or there should indeed have gradient norm? Thanks a lot!
Another side questions that not related to this topic, I noticed you do the natural log of boxes shape for anchors and the sparse4d-v3 learns the box shape in log-scale. May I know what's the reason? Is that because you want the box shape to have both negative and positive values? But I thought the sparse4D learns the offset of the box shape, so it probably doesn't matter if you don't switch to log scale? I wish to learn more insights from you about this. Thank you!
1) From the perspective of the computation graph, learnable_fc is also not in the computation graph. The actual input of the first layer should directly be the key points, so it is reasonable that learnable_fc does not have gradients, and it will not cause performance loss.
2) Using log-scale is based on common practices in 2D detection. Without using log-scale, predicting the offset could potentially make the scale negative. However, preventing negative outputs is not the reason we chose log-scale; it was chosen based on experience. We also conducted experiments later, and using scale directly resulted in better performance for some metrics
Hello Xuewu, Sorry to bother you again on this issue thread but I have another question related to the sparse4d v3 code.
I found you set the
instance_feature
gradient toFalse
in the code. Is that intentional? If it is, may I know what's the reason? Is that because you always want the instance features initialized to be zeros? I also saw a few other places that you set the gradient toFalse
, for instance in theKeyPointGenerator
and just to confirm with you, the reason for this False is because it is fix-scale key points, right?Thank you for your time!