georghess / voxel-mae

Code for the paper "Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds"
Apache License 2.0
73 stars 6 forks source link

Implementation of voxel-mae in centerpoint #16

Closed hht1996ok closed 12 months ago

hht1996ok commented 1 year ago

Hi, @georghess Thanks to the author for his outstanding work! When I reproduced voxel-mae in centerpoint, I was confused about the pre-trained network structure.

  1. Did you add backbone and neck to the self-supervised network, or did you just use middle encode to extract features?
  2. If backbone and neck are used, how do you upsample the bev feature map to the original size voxel? Yours
georghess commented 1 year ago

Hi @hht1996ok,

By "reproduce voxel-mae in centerpoint", do you mean using the original centerpoint model, e.g., a convolution-based backbone? Or do you mean this implementation which uses SST backbone with centerpoint-style detector?

  1. We added backbone and neck to the self-supervised network.
  2. We do not have to do any upsampling as the SST backbone keeps the same spatial resolution through the entire network. However, I have played around with using a conv-based backbone as well. This was a while ago, but if I remember correctly, I simply ran the decoder at the coarser resolution, matching the output from the voxelnet/pointpillar-backbone.
hht1996ok commented 1 year ago

Hi @georghess, Sorry, I misunderstood the meaning of Table 3, and I mistakenly thought that you adapted voxel-mae in centerpoint. From the experiment, I find that MAE-base method is not effective for the model of convolution based backbone. What do you think?

georghess commented 1 year ago

No worries.

Yes, for my naive implementation, I only saw very minor gains for CNN-based backbone. You could take a look at https://github.com/chaytonmin/Occupancy-MAE for a more thorough study of this though.