About ViT-B - Githubissues

jihaonew commented 1 year ago

Hi,

have you ever tried to train ViT-B without Obj365 pretraining?

chentingpc commented 1 year ago

We didn't train ViT-B from scratch on COCO, I can imagine it can also work if the backbone is pretrained on imagenet (like typical object detection models) and also train with strong augmentation. But we haven't compared. Pretrained on Object365 from scratch (without imagenet pretraining) is easier to set up for us and can also initialize the decoder.

On Wed, Nov 9, 2022 at 06:09 Jihao Liu @.***> wrote:

Hi,

have you ever tried to train ViT-B without Obj365 pretraining?

— Reply to this email directly, view it on GitHub https://github.com/google-research/pix2seq/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKERUJWTYEVDKOA3UJ4VH3WHOWCHANCNFSM6AAAAAAR3PDWA4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jihaonew commented 1 year ago

Thanks for the reply. Yes, imagenet pretrain can indeed work. I am just curious about the performance gap if using imagenet pretrain.

google-research / pix2seq

About ViT-B #20