Closed kavyasreedhar closed 2 years ago
Just an update on the training front -- it looks like the in_index parameter in decode_heads.py is hard-coded to 23 somewhere...forcing that to always be -1 (the default value) appears to fix the problem for training with only 12 layers.
Thanks for your question. Will provide the model and config after the holiday.
Sounds good, thank you!
Hi, I just wanted to check if you could please provide the model and config? Thank you!
We have provided the config files and model link for SETR-Naive-Base.
Hi, do you have a google drive link for the models with T-Base referenced in the paper (such as SETR-Naive-Base) as well as the corresponding configuration files?
Alternatively, what configuration can I use to train the model if it is not readily available? I tried changing the depth in SETR/configs/base/models/setr_naive_pup.py to 12, but that errors out with "RuntimeError: shape '[2, 1025, 3, 12, 85]' is invalid for input of size 6297600" when using the ADE20K configuration file (https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_Naive_512x512_160k_ade20k_bs_16.py) for training. Changing the embedding dimension in this file from 1024 results in a lot of shape mismatches with the pretrained imagenet21k model as well. The default training with the T-large depth and embedding dimension work for me with the same file.
Thanks for your help.