I noticed there are several variants of datasets under the ft-training_set directory about math reasoning, such as math_7k.json, math_10k.json, and math_data.json. It seems that apart from math_10k, the other datasets are not explained in detail. I'd like to inquire about their relationships. Can I use them to conduct experimental analysis on the impact of data volume on performance?
I noticed there are several variants of datasets under the ft-training_set directory about math reasoning, such as math_7k.json, math_10k.json, and math_data.json. It seems that apart from math_10k, the other datasets are not explained in detail. I'd like to inquire about their relationships. Can I use them to conduct experimental analysis on the impact of data volume on performance?