question about adapter initialization for segmentation vida

Hi, I have a question about the initialization of the adapters in vida segmentation model.

According to this ViDA OpenReview for NeurIPS '23 in this link, I noticed that you have tried 3 versions of adapter initializations for CityScapes-to-ACDC experiments: Scratch, ImageNet pretrained and Source pretrained. (I think I read this table somewhere inside a paper, maybe in a supplementary material for ICLR '24(?), but I cannot find the paper rn.)

Since it was mentioned clearly in the experiments section of the paper that you have used SegFormer Mix Transformer as the backbone of your segmentation model, I got curious how you pretrained Mix Transformer encoder with ImageNet dataset since the dataset is specifically designed for image classification tasks.

It seems possible extracting image features using Mix Transformer encoder and then put them into MLP head for image classification, but I wanted to make sure.

Thank you!

Yangsenqiao / vida

question about adapter initialization for segmentation vida #6