Closed XiaoLei2123 closed 2 weeks ago
Hi @XiaoLei2123, thanks for your interest in our work~
This error may occur when the size of a single parquet file is larger than 2GB. To deal with it, we previously updated our code to save the final logp parquets in every 5000 samples, could you please update the code base to see if it still occurs?
If there are further questions, we are willing to help!
Thank you for your reply.I will update my code to solve this question.
If i want to solve this question , which python files I need to update?Can you offer a list about this question.Thank you!
Hi @XiaoLei2123 ! The files you need to update are:
muffin/data/datasets.py
: https://github.com/RLHF-V/RLAIF-V/blob/main/muffin/data/datasets.pymuffin/eval/muffin_inference_logp.py
: https://github.com/RLHF-V/RLAIF-V/blob/main/muffin/eval/muffin_inference_logp.pyYou can try to see if this can help~