the actual number of samples of the huggingface RLAIF-V-Dataset is 83k, not 30k?

RLHF-V / RLAIF-V

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

200 stars 6 forks source link

the actual number of samples of the huggingface RLAIF-V-Dataset is 83k, not 30k? #25

Closed Molly-3000 closed 2 weeks ago

Molly-3000 commented 2 weeks ago

Hi, there~

After reading the parquets files of the RLAIF-V-Dataset downloaded from Hugging Face, I actually got 83k samples, which is significantly more than the "30k data" mentioned in the README.

Could you please clarify what this additional data is? @yiranyyu

yiranyyu commented 2 weeks ago

We upload additional training samples that collected during our additional experiments such as ablation study and the training of MiniCPM-V series models.

Molly-3000 commented 2 weeks ago

We upload additional training samples that collected during our additional experiments such as ablation study and the training of MiniCPM-V series models.

Got it. Very helpful, thanks a lot.😀