Transformer_blocks architecture in SPT

drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"

MIT License

508 stars 65 forks source link

Hi @MenglinQiu, we experimentally found the FFN to be optional, depending on the datasets. For S3DIS, for instance, using FFN would degrade performance. For KITTI-360, however, using a small FFN slightly improved results. In transformer architectures, FFN blocks can quickly account for a large portion of the model weights. As discussed in our paper, our hierarchical superpoint structure strongly reduces the size of the datasets, exposing large models to overfitting. We show this behavior in our ablations, where increasing model size does not necessarily increase performance. This does not mean that Superpoint Transformer models do not scale or that FFN are inherently useless, but rather that the 3D datasets must be greatly expanded for larger SPT models to be worth exploring.

drprojects / superpoint_transformer

Transformer_blocks architecture in SPT #105