HorizonRobotics / Sparse4D

MIT License
326 stars 31 forks source link

A few questions related to the model #48

Closed CeZh closed 4 months ago

CeZh commented 6 months ago

Hello Xuewu, Sorry to bother you again on this issue thread but I have another question related to the sparse4d v3 code.

I found you set the instance_feature gradient to False in the code. Is that intentional? If it is, may I know what's the reason? Is that because you always want the instance features initialized to be zeros? I also saw a few other places that you set the gradient to False, for instance in the KeyPointGenerator and just to confirm with you, the reason for this False is because it is fix-scale key points, right?

Thank you for your time!

linxuewu commented 6 months ago

The input instance feature actually has no physical meaning, and it will not be retained in the computation graph during tracing. The computation graph retains only the key points from the first layer, so we consider the gradient of this instance feature to be unimportant.

linxuewu commented 6 months ago

Disabling the gradient of fix_scale from our original design intention to fix key points, ensuring that these points can always sample features.

CeZh commented 6 months ago

Thanks a lot!

CeZh commented 5 months ago

Hello Xuewu, As for the instance_features, I understand your explanation and verified that I can reproduce your results when it set the require_grad to false. However, when I take a close look at the model, isn't that cause the first deformable layer's (layer_0 in your definition) self.learnable_fc inside the SparseBox3DKeyPointsGenerator not learning? It seems the self.learnable_fc's gradient norm at the first deformable layer should always be 0, right? May I know is that on purpose or there should indeed have gradient norm? Thanks a lot!

Another side questions that not related to this topic, I noticed you do the natural log of boxes shape for anchors and the sparse4d-v3 learns the box shape in log-scale. May I know what's the reason? Is that because you want the box shape to have both negative and positive values? But I thought the sparse4D learns the offset of the box shape, so it probably doesn't matter if you don't switch to log scale? I wish to learn more insights from you about this. Thank you!

linxuewu commented 5 months ago

1) From the perspective of the computation graph, learnable_fc is also not in the computation graph. The actual input of the first layer should directly be the key points, so it is reasonable that learnable_fc does not have gradients, and it will not cause performance loss.

2) Using log-scale is based on common practices in 2D detection. Without using log-scale, predicting the offset could potentially make the scale negative. However, preventing negative outputs is not the reason we chose log-scale; it was chosen based on experience. We also conducted experiments later, and using scale directly resulted in better performance for some metrics