Open jungle-gym-ac opened 1 month ago
We are still organizing our training data, and will release them soon. Basically we just delete those data that has invalid image (i.e. an empty image by some reasons) and rewrite the text only data so that they can fit into our training pipeline. The data we are using is mostly the same as the original llava_v1_5_mix665k.json
.
Question
Hi, great work! I noticed that the training data path in the training script you provided is
llava_v1_5_mix665k_clean_ok.json
, which is not the originalllava_v1_5_mix665k.json
. Did you do any data cleaning or post-processing to the original json provided by llava? Thank you!