fudan-zvg / SETR

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
MIT License
1.05k stars 149 forks source link

SETR-Naive-Base model #49

Closed kavyasreedhar closed 2 years ago

kavyasreedhar commented 2 years ago

Hi, do you have a google drive link for the models with T-Base referenced in the paper (such as SETR-Naive-Base) as well as the corresponding configuration files?

Alternatively, what configuration can I use to train the model if it is not readily available? I tried changing the depth in SETR/configs/base/models/setr_naive_pup.py to 12, but that errors out with "RuntimeError: shape '[2, 1025, 3, 12, 85]' is invalid for input of size 6297600" when using the ADE20K configuration file (https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_Naive_512x512_160k_ade20k_bs_16.py) for training. Changing the embedding dimension in this file from 1024 results in a lot of shape mismatches with the pretrained imagenet21k model as well. The default training with the T-large depth and embedding dimension work for me with the same file.

Thanks for your help.

kavyasreedhar commented 2 years ago

Just an update on the training front -- it looks like the in_index parameter in decode_heads.py is hard-coded to 23 somewhere...forcing that to always be -1 (the default value) appears to fix the problem for training with only 12 layers.

sixiaozheng commented 2 years ago

Thanks for your question. Will provide the model and config after the holiday.

kavyasreedhar commented 2 years ago

Sounds good, thank you!

kavyasreedhar commented 2 years ago

Hi, I just wanted to check if you could please provide the model and config? Thank you!

sixiaozheng commented 2 years ago

We have provided the config files and model link for SETR-Naive-Base.

image