Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
878 stars 55 forks source link

math dataset incomplete description #62

Open hubenjm opened 1 month ago

hubenjm commented 1 month ago

In https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/datasets_mixture.py#L171C5-L171C6 the math dataset is described as type 'vflan'. However, in data_prepare/README.md it isn't clear what corresponds to that. I'm guessing it is GSM8K-ScRel-SFT. But the format of the annotation file https://github.com/OFA-Sys/gsm8k-ScRel/blob/main/data/train_use.jsonl does not directly work with the LazyVFlanDataset class (https://github.com/Efficient-Large-Model/VILA/blob/d7d54bc4ca1e582f59516ba2f94a0217ad2430a0/llava/data/dataset.py#L1313), as it expects multiple .pkl files to live inside the data_path directory. Any elaboration on how you formatted the original train_use.jsonl file into .pkl files or if some other approach was used?

Seerkfang commented 1 month ago

Hi, thanks for using VILA. If you click the link in data_prepare/README.md, gsm8k-ScRel will refer you to the annotation file. Instance from this file contains one "query" and one "response" fields. We simply format them into the following format: {'id': 0, 'question': 'Q:Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\nA:', 'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72', 'image': []}

hubenjm commented 3 weeks ago

Thanks for the clarification. Hopefully you could add these details to the README.md file in a future commit.