Closed lucasjinreal closed 4 months ago
Hello thanks for your interest! We follow the training scheme as DataComp did (https://arxiv.org/pdf/2304.14108.pdf) to have a fair comparison, so we haven't applied masking.
Hi, I still wonna ask a little question.
Have consider using next token prediction to train the ViT, not constractive learning. Do u find any work did this method?
Thanks, that is a really interesting question. Yet we haven't experimented in the manner of next token prediction.
https://arxiv.org/abs/2212.00794