Question about datasets variants

AGI-Edgerunners / LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

https://arxiv.org/abs/2304.01933

Apache License 2.0

1.01k stars 91 forks source link

Question about datasets variants #66

Open ZeguanXiao opened 2 months ago

ZeguanXiao commented 2 months ago

I noticed there are several variants of datasets under the ft-training_set directory about math reasoning, such as math_7k.json, math_10k.json, and math_data.json. It seems that apart from math_10k, the other datasets are not explained in detail. I'd like to inquire about their relationships. Can I use them to conduct experimental analysis on the impact of data volume on performance?

AaronZLT commented 3 weeks ago

hi @ZeguanXiao , just curious about whether the math50k.json, contains all slices of the other math**k.json?