请问可以提供instruction数据集的评估代码吗？

hy010227 commented 2 months ago

您好，请问可以提供instruction数据集的评估代码吗？

GCYZSL commented 2 months ago

您好，您可以参考readme中我们提供scienceQA的评估代码，对于所有QA任务，其都可以使用。对于分类任务只需要将代码里的选项列表，替换成相应的label即可。

Hi, You can use the evaluation code we provided to evaluate other QA tasks. We have detailed instructions on how to use it in ReadMe. For classification tasks, you can simply change the choices in the code to the labels for the corresponding tasks.

hy010227 commented 2 months ago

您好，您可以参考readme中我们提供scienceQA的评估代码，对于所有QA任务，其都可以使用。对于分类任务只需要将代码里的选项列表，替换成相应的label即可。

Hi, You can use the evaluation code we provided to evaluate other QA tasks. We have detailed instructions on how to use it in ReadMe. For classification tasks, you can simply change the choices in the code to the labels for the corresponding tasks.

是的，我在scienceqa的模板上修改的instruction的数据集，但是因为这个处理好的数据集没有类似scienceqa的test.json文件，因此我修改了一部分evaluate中的数据处理的部分，直接读取的arrow文件并进行了一小部分数据处理的修改，但是最终出现的acc准确率为0，因此我不太清楚那个地方出现了问题。

GCYZSL commented 2 months ago

请问您使用的instruction数据集是什么呢？一般的instruction训练数据集是一些很通用的问题，比如数学，文学，写诗，这种是没有标准的test的数据和文件的。如果要探究模型的instruction的能力，可以用一些标准的benchmark，比如：mt-bench。谢谢！

hy010227 commented 2 months ago

mt-bench

我用的是matemathqa的数据集，感觉评估时还是数据集处理有问题，仿照着scienceqa的模板对数据集进行处理的问题，处理不得当，可以给一下评估时instruction数据集的处理模版吗？因为我仿照着scienceqa的评估模板进行处理时，最终batch['instruction']和batch['input']均为空，只有batch['output']有值，不知道是不是我直接读取的meta_moe_50k.hf中的test中的arrow文件，但是不这样的话我不太清楚数据集怎么处理，因为在preparation_instruction_data.py预处理后，并没有test.json文件，可以提供一下评估时instruction数据集的处理模版吗

hy010227 commented 2 months ago

或者可不可以把如何处理您示例中所用的instruction数据集Open-Orca/OpenOrca转换成test.json类似文件的样例流程发到我的邮箱3117727519@qq.com？谢谢！

GCYZSL commented 2 months ago

您好，如果您想要做mathqa的instruction tunning 您可以直接用preparation_scienceqa_data.py 的代码。这个会有test.json。

preparation_instruction_data.py是用来做general 的tuning，用的是所有token的predict next token的方式训练的，是没有test的。我们的文章中的第二种setting做法是，在openorca上训练完之后，再在scienceqa上训练，然后在scienceqa测试的。

hy010227 commented 2 months ago

您好，如果您想要做mathqa的instruction tunning 您可以直接用preparation_scienceqa_data.py 的代码。这个会有test.json。

preparation_instruction_data.py是用来做general 的tuning，用的是所有token的predict next token的方式训练的，是没有test的。我们的文章中的第二种setting做法是，在openorca上训练完之后，再在scienceqa上训练，然后在scienceqa测试的。

好的，我明白您的意思了，我接下来将会再去preparation_scienceqa_data.py试试，谢谢！

GCYZSL commented 2 months ago

不客气，谢谢您的支持！我们后续会整理代码完善repo的！

GCYZSL / MoLA

请问可以提供instruction数据集的评估代码吗？ #11