About the training data of AlpacaFarm, ToxiGen, TruthfulQA, CodexEval and DS-1000

1170300714 commented 6 months ago

Hi, Thanks for your great job! Could you please share the training data of AlpacaFarm, ToxiGen, TruthfulQA, CodexEval and DS-1000? In fact, I can only find the training data of GSM and TriviaQA in https://github.com/alisawuffles/proxy-tuning/issues/3.

Thanks a lot!

1170300714 commented 6 months ago

In addition, what is the different between ''Instruction-Tuning'' , "Code Adaptation" and ''Task Finetuning" in terms of training paradigm? In practice, I have reproduce the ''Task Finetuning'' in your paper by the open_instruct/finetune.py which you released. For Instruction-Tuning and Code Adaptation, which corresponding training code should I run to start the experiment?

Thanks a lot~

chunmeifeng commented 6 months ago

Hi author, I also need the training data of AlpacaFarm, ToxiGen, TruthfulQA, CodexEval and DS-1000, could you please share with this?

1170300714 commented 6 months ago

收到，感谢您的来件

alisawuffles commented 6 months ago

We do not use training data for any of the tasks you listed. For instruction-tuning and code adaptation experiments, we use off-the-shelf models (in other words, we did not do any training ourselves). Moreover, the experts are supposed to be general-purpose instruction-following and code models, respectively — they are not task specific.

For the instruction-tuning experiments, for the expert we use a general instruction-tuned model from the Llama2 chat series, which you can find here on HuggingFace models. For code adaptation experiments, we use a general code model, the CodeLlama series, here. You can see how to do evaluation in the eval scripts.

The three sections evaluate three common use cases of tuning: instruction-tuning, domain adaptation, and task-specific finetuning. The expert model for each section is chosen accordingly.

1170300714 commented 6 months ago

Thanks for your reply!

alisawuffles / proxy-tuning

About the training data of AlpacaFarm, ToxiGen, TruthfulQA, CodexEval and DS-1000 #4