SHI-Labs / OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023
https://praeclarumjj3.github.io/oneformer
MIT License
1.44k stars 130 forks source link

Does Q only have 3 possible values? #3

Closed JunzheJosephZhu closed 1 year ago

JunzheJosephZhu commented 1 year ago

According to the paper, the queries Q are only conditioned on "the task is {task}", but {task} only has 3 possible values. So do the queries only have 3 possible values?

praeclarumjj3 commented 1 year ago

Hi @JunzheJosephZhu, thanks for your interest in our work.

The queries are initialized with the mappings of the tokenized "the task is {task}" input (task token), so the queries only have 3 possible initial values. However, we update the queries first inside a transformer post-initialization (with task token) and then inside the transformer decoder post-concatenation (with task token). So the queries are updated according to the feature representations from the pixel decoder.

praeclarumjj3 commented 1 year ago

Feel free to re-open if you face any issues.