OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.43k stars 248 forks source link

Question about data processing in pretrain data. #434

Open JJJYmmm opened 8 months ago

JJJYmmm commented 8 months ago

Why the following actions were taken. Is there anything special about cc12m I missed?

https://github.com/OFA-Sys/OFA/blob/a36b91ce86ff105ac8d9e513aa88f42b85e33479/data/pretrain_data/unify_dataset.py#L321-L323

Looking forward to your reply.

JJJYmmm commented 8 months ago

Another question. When computing the loss AdjustLabelSmoothedCrossEntropyCriterion, sample_patch_num is added into the model input(sample[0], which I think is correspond to sample_v1, the vision-language data)

https://github.com/OFA-Sys/OFA/blob/a36b91ce86ff105ac8d9e513aa88f42b85e33479/criterions/label_smoothed_cross_entropy.py#L177-L178

It seems that sample_patch_num can select fixed number of image features. So why it's just used in VL data?

https://github.com/OFA-Sys/OFA/blob/a36b91ce86ff105ac8d9e513aa88f42b85e33479/models/ofa/unify_transformer.py#L759-L769