axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.75k stars 851 forks source link

Evaluate on specified data #875

Closed Peter-Devine closed 6 months ago

Peter-Devine commented 11 months ago

โš ๏ธ Please check that this feature request hasn't been suggested before.

๐Ÿ”– Feature description

I want to evaluate on data that may be distinct of the training data.

Currently, the evaluation data is a random sample of the training data, but I have a situation where I have a lot of training data from a slightly noisy Dataset A and then a very small amount of very high quality data from Dataset B.

I want to be able to train on Dataset A and evaluate on Dataset B.

โœ”๏ธ Solution

When using a Huggingface dataset, it would be nice to use the actual validation set as the eval_dataset for training. This way, you could manually specify which data will be used in training and what will be used in validation.

I think some code would have to be refactored in https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/data.py

Thanks!

โ“ Alternatives

No response

๐Ÿ“ Additional Context

No response

Acknowledgements

codiceSpaghetti commented 9 months ago

I would need this feature as well

JiyangZhang commented 9 months ago

Any updates on this enhancement? Thanks!

Peter-Devine commented 9 months ago

Bump. It would be really handy to be able to evaluate continuously on a specified dataset different from the training dataset so that we could control early stopping etc. based on the performance of a target task.

For example, if we are just training on unstructured text but evaluating on a small structured test dataset, this could help us find the optimal training amount of transfer learning for the target task.

Thanks.

NanoCode012 commented 6 months ago

Hey, PR #786 allows for test_dataset: now. We also have bench_dataset if you want to run benchmarks (more info: https://github.com/OpenAccess-AI-Collective/axolotl/issues/311#issuecomment-2028311885).