Optimizing Fine-tuning performance via QD with fixed number of samples

Experiment design:

Goals:
1. Identify pareto optimal tradeoff frontier for quality and diversity
2. Identify optimal QD hyperparamters for downstream fine-tuning performance
Procedure:
1. Fix a large synthetically generated dataset S
2. Fix quality metric and diversity metric.
3. Fix a sample budget N and subsample S to produce training datasets T_1,...,T_k with varying quality and diversity
4. Fine-tune pre-trained model M separately on T_1, ..., T_k and record test performance.
5. Repeat for multiple sample budgets N_1,...,N_l.
6. Identify optimal QD parameters for fine-tuning at each sample budget and try to fit functional form/predict optimal parameters for new sample budgets (QD scaling law?).
Resources:
1. For dataset can start with Max's GSM8K data
2. For quality metric can try using Max's RM (though might want to train bigger one):
3. Max's (data)[https://huggingface.co/datasets/reciprocate/tinygsm_dpo] for RM training
4. For diversity metric try defining equivalence of two solutions via their order of arithmetic operations. Diversity of a dataset is then the number of unique solutions.

Some more food for thought:

Might be nice to have distinct testing regimes: in distribution test and OOD test. Hypothesis: higher quality will correlate better with better in-distribution test performance and higher diversity will correlate better with higher OOD performance.

Dahoas / QDSyntheticData