google-research / pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
Apache License 2.0
857 stars 71 forks source link

About ViT-B #20

Closed jihaonew closed 1 year ago

jihaonew commented 1 year ago

Hi,

have you ever tried to train ViT-B without Obj365 pretraining?

chentingpc commented 1 year ago

We didn't train ViT-B from scratch on COCO, I can imagine it can also work if the backbone is pretrained on imagenet (like typical object detection models) and also train with strong augmentation. But we haven't compared. Pretrained on Object365 from scratch (without imagenet pretraining) is easier to set up for us and can also initialize the decoder.

On Wed, Nov 9, 2022 at 06:09 Jihao Liu @.***> wrote:

Hi,

have you ever tried to train ViT-B without Obj365 pretraining?

— Reply to this email directly, view it on GitHub https://github.com/google-research/pix2seq/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKERUJWTYEVDKOA3UJ4VH3WHOWCHANCNFSM6AAAAAAR3PDWA4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jihaonew commented 1 year ago

Thanks for the reply. Yes, imagenet pretrain can indeed work. I am just curious about the performance gap if using imagenet pretrain.