How many shots are used for evaluation of HumanEval?

GAIR-NLP / factool

FacTool: Factuality Detection in Generative AI

https://ethanc111.github.io/factool_website/

Apache License 2.0

804 stars 62 forks source link

How many shots are used for evaluation of HumanEval? #39

Open zhimin-z opened 9 months ago

zhimin-z commented 9 months ago

Is that 0-shot and 3-shot CoT?

EthanC111 commented 8 months ago

Hi @zhimin-z . If you are referring to the Self-Check baselines, then yes. Please refer to https://github.com/GAIR-NLP/factool/blob/main/factool/utils/prompts/self_check.yaml

zhimin-z commented 8 months ago

Hi @zhimin-z . If you are referring to the Self-Check baselines, then yes. Please refer to https://github.com/GAIR-NLP/factool/blob/main/factool/utils/prompts/self_check.yaml

Thanks for your quick replies Are these evaluation results from 0-shot or 3-shot CoT?

EthanC111 commented 8 months ago

Hi @zhimin-z. These results were evaluated neither using 0-shot nor 3-shot. The results were evaluated using FacTool. If you are referring to how the individual modules in FacTool are implemented, please refer to our prompts at https://github.com/GAIR-NLP/factool/tree/main/factool/utils/prompts

For some modules in certain tasks we did provide some demonstrations (e.g. query generation for KB-QA), while others we use 0-shot prompting.