Beckschen / ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
Apache License 2.0
162 stars 5 forks source link

Why not taking this method to accerate training? FLIP #5

Closed lucasjinreal closed 4 months ago

lucasjinreal commented 5 months ago

https://arxiv.org/abs/2212.00794

Beckschen commented 5 months ago

Hello thanks for your interest! We follow the training scheme as DataComp did (https://arxiv.org/pdf/2304.14108.pdf) to have a fair comparison, so we haven't applied masking.

lucasjinreal commented 5 months ago

Hi, I still wonna ask a little question.

Have consider using next token prediction to train the ViT, not constractive learning. Do u find any work did this method?

Beckschen commented 4 months ago

Thanks, that is a really interesting question. Yet we haven't experimented in the manner of next token prediction.