Pre-training info - Githubissues

nargenziano commented 1 year ago

Hello and thanks for the work.

I was wondering if you could share more info regarding the pre-training of MiT architectures. I've read from other issues that the configs are the same as pvt_v2, but what is the actual pre-training code you used? Is it the PVT classification training? I tried to edit the PyramidVisionTransformer model to make it identical to MiT B3 and ran the ImageNet classification training of PVT from scratch, however, the classification performance was worse than the one expected for PVT-v2 B3 (around 77.3% Acc@1, instead of the expected 83.1%). What is the expected pre-training performance of MiT?

gauenk commented 1 year ago

I have the same question.

The paper states "We pre-train the encoder on the Imagenet-1K dataset".

Does this mean the encoder is trained a classification task first? If so, is there code for this to share? I can not find it in the repo.

Primarily, I want to be able to reproduce the "mit_*pth" files, either conceptually or with your code.

gauenk commented 1 year ago

following up...

wangh09 commented 1 year ago

Same question here. Seems that the classification head(commented out) in MiT backbone won't work cause the output of stage 4 is B*49*512, and can't directly be followed with an nn.Linear to output B*1000.

Mike-HH commented 1 year ago

following up..

waw123456 commented 10 months ago

following up...

NVlabs / SegFormer

Pre-training info #116