VDIGPKU / IterNet

14 stars 1 forks source link

In Ablation experiments, How much data has been utilized for vision pre-training? #4

Open SoundingSilence opened 2 years ago

SoundingSilence commented 2 years ago

As mentioned in the paper, you use 20% training data(around 16M*0.2 = 3.2M) to train the model. I have some questions about it. Previously, the baseline model ABINet consists of three stages: vision pre-training, language-pre-training and final-training. In your settings, you use 20% data for final-training. Also, there is no doubt the BCN language model is pre-trained on WikiText. But How much data has been utilized for vision pre-training? 100% training data or 20% data? Hope to receive your answer, Thanks!

chuxiaojie commented 2 years ago

In ablation experiments, both vision pre-training and final-training utilize only 20% training data.