why do you perform down sampling after the first layer in 3detr-m, rather than the whole encoder?

facebookresearch / 3detr

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

Apache License 2.0

629 stars 79 forks source link

why do you perform down sampling after the first layer in 3detr-m, rather than the whole encoder? #36

Open ch3cook-fdu opened 2 years ago

ch3cook-fdu commented 2 years ago

Would this operation leads to performance drop? or because of the computational cost?

imisra commented 2 years ago

We followed PointNet++ for this design decision, where the downsampling is performed after the first layer. In initial experiments, directly downsampling gave worse results.

ch3cook-fdu commented 2 years ago

I mean, "PointNetSA -> Encoder -> Encoder -> Encoder -> DownSampling", rather than "PointNetSA -> Encoder -> DownSampling -> Encoder -> Encoder". Since it's known that DownSampling in PointNet++ loses information, "PointNetSA -> DownSampling -> Encoder -> Encoder -> Encoder" would not be a good choice.