drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"
MIT License
508 stars 65 forks source link

Transformer_blocks architecture in SPT #105

Closed MenglinQiu closed 2 months ago

MenglinQiu commented 2 months ago

Hi bro, this is an excellent work that combines superpoint, Transformer and hierarchical graph, and achieves great results. I have some questions about the transformer block. As we all know, the standard transformer blocks is composed of the Multi-Head Attention(MSA) and Feed-forward Network(FFN), why is the FFN not needed in the SPT. Does the FFN cause performance loss? Can you give me some guidance? Thank you very much!

drprojects commented 2 months ago

Hi @MenglinQiu, we experimentally found the FFN to be optional, depending on the datasets. For S3DIS, for instance, using FFN would degrade performance. For KITTI-360, however, using a small FFN slightly improved results. In transformer architectures, FFN blocks can quickly account for a large portion of the model weights. As discussed in our paper, our hierarchical superpoint structure strongly reduces the size of the datasets, exposing large models to overfitting. We show this behavior in our ablations, where increasing model size does not necessarily increase performance. This does not mean that Superpoint Transformer models do not scale or that FFN are inherently useless, but rather that the 3D datasets must be greatly expanded for larger SPT models to be worth exploring.