Question about data processing in pretrain data.

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Apache License 2.0

2.43k stars 248 forks source link

Another question. When computing the loss AdjustLabelSmoothedCrossEntropyCriterion, sample_patch_num is added into the model input(sample[0], which I think is correspond to sample_v1, the vision-language data)

https://github.com/OFA-Sys/OFA/blob/a36b91ce86ff105ac8d9e513aa88f42b85e33479/criterions/label_smoothed_cross_entropy.py#L177-L178

It seems that sample_patch_num can select fixed number of image features. So why it's just used in VL data?

https://github.com/OFA-Sys/OFA/blob/a36b91ce86ff105ac8d9e513aa88f42b85e33479/models/ofa/unify_transformer.py#L759-L769

OFA-Sys / OFA

Question about data processing in pretrain data. #434