Open JJJYmmm opened 8 months ago
Another question.
When computing the loss AdjustLabelSmoothedCrossEntropyCriterion
, sample_patch_num
is added into the model input(sample[0], which I think is correspond to sample_v1, the vision-language data)
It seems that sample_patch_num
can select fixed number of image features. So why it's just used in VL data?
Why the following actions were taken. Is there anything special about cc12m I missed?
https://github.com/OFA-Sys/OFA/blob/a36b91ce86ff105ac8d9e513aa88f42b85e33479/data/pretrain_data/unify_dataset.py#L321-L323
Looking forward to your reply.