fudan-zvg / SETR

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
MIT License
1.05k stars 149 forks source link

Difference with ViT #53

Open TechChuanyu opened 2 years ago

TechChuanyu commented 2 years ago

It looks like this paper use ViT https://arxiv.org/abs/2010.11929 as backbone, with a simple decoder for segmentation?

To be honest it should be just an ablation study from ViT, instead of proposing a new paper.

If I am wrong please correct