emecercelik / ssl-3d-detection

38 stars 7 forks source link

Pointpillars and centerpoint same weights #6

Open antoniskef opened 1 year ago

antoniskef commented 1 year ago

Hi, I would like to ask if you are using the same weights (from pointpillars flow scene) in both the pointpillars and centerpoint object detection downstream task. And if yes I would like to ask how is that possible since they use different voxel encoder (HardVFE vs PillarFeatureNet) and different neck (FPN vs SECONDFPN). Thank you very much.

MingyuLiu1 commented 1 year ago

Hi, We pre-trained PointPillars and CenterPoint individually.

HenryJunW commented 11 months ago

May I ask how long it takes to train PointPillars and CenterPoint scene flow backbone? Thanks!

MingyuLiu1 commented 10 months ago

Hi it takes around one day for PointPillars on one 3090. CenterPoint will be longer.

HenryJunW commented 10 months ago

Thank you so much! One more question regarding training centerpoint scene flow backbone on nuScenes, I don’t see any config file ending with ‘flow’ here, am I missing something, https://github.com/emecercelik/ssl-3d-detection/tree/master/mmdetection3d/mmdetection3d/configs/centerpoint? Thanks!

MingyuLiu1 commented 10 months ago

Thank you so much! One more question regarding training centerpoint scene flow backbone on nuScenes, I don’t see any config file ending with ‘flow’ here, am I missing something, https://github.com/emecercelik/ssl-3d-detection/tree/master/mmdetection3d/mmdetection3d/configs/centerpoint? Thanks!

Hi, we did not provide codes for CenterPoint training, sorry for that. However, you can modify the codes according to PointPillars.

HenryJunW commented 10 months ago

I see. Thanks for your reply! Can you elaborate a little bit more for which files I need to modify? From my understanding now, I need to change the piller_encoder.py with the scene flow backbone. Anything else I need to modify? Thanks.

In addition, if I want to train the pointpillars-based scene flow backbone, do I need to change the max_epochs to be 4 from 24 here, https://github.com/emecercelik/ssl-3d-detection/blob/e605ad616278cdd7cb0c6cd5b8479c8c3921c158/mmdetection3d/mmdetection3d/configs/_base_/schedules/schedule_2x.py#L14. Since from the paper, the scene flow auxiliary training is trained for 4 epochs. Thanks again!

MingyuLiu1 commented 10 months ago
  1. You need to modify the pillar_encoder.py, and recall the ssl cycle loss to calculate the loss.
  2. Yes, please set it to 4.
HenryJunW commented 10 months ago

Thanks for your comments! I will try that.

HenryJunW commented 10 months ago

@MingyuLiu1 I get one question regarding the initialization of 3D detectors (i.e, PointPillars) from the scene flow training, since the subsampled points' features used for cycle_loss calculation during scene flow training are from pts_voxel_encoder as here, https://github.com/emecercelik/ssl-3d-detection/blob/e605ad616278cdd7cb0c6cd5b8479c8c3921c158/mmdetection3d/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py#L204

So does simply the pts_voxel_encoder’s weights of the PointPillars (i.e., HardVFE) are initialized with the pre-trained weights from the self-supervised scene flow training? In other words, the weights for middle_encoder/pts_backbone are not initialized using scene flow training? Please correct me if I am wrong. Thanks for your clarification.

MingyuLiu1 commented 8 months ago

@MingyuLiu1 I get one question regarding the initialization of 3D detectors (i.e, PointPillars) from the scene flow training, since the subsampled points' features used for cycle_loss calculation during scene flow training are from pts_voxel_encoder as here,

https://github.com/emecercelik/ssl-3d-detection/blob/e605ad616278cdd7cb0c6cd5b8479c8c3921c158/mmdetection3d/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py#L204

So does simply the pts_voxel_encoder’s weights of the PointPillars (i.e., HardVFE) are initialized with the pre-trained weights from the self-supervised scene flow training? In other words, the weights for middle_encoder/pts_backbone are not initialized using scene flow training? Please correct me if I am wrong. Thanks for your clarification.

Yes, exactly. Because after the middle_encoder/pts_backbone, the features are not point-wise features anymore. In other words, it is not suitable to combine these features with the coordinates of points to calculate scene flow. According to our experiments, utilizing the features after middle_encoder/pts_backbone hurts the final detector performance. However, for the PointGNN model, because it always focuses on the point features, the whole backbone can be used for SSL scene flow training. Hope the explanation is clear :))