NoviScl / MoRE

7 stars 2 forks source link

few shot instruction #1

Open jeesonwang opened 10 months ago

jeesonwang commented 10 months ago

Hello, may I ask what the few-shot instruction(prompt) you use when doing the evaluation?

NoviScl commented 10 months ago

Hey, sorry for the late reply. The prompts are all just few-shot demonstrations plus the specialized prompting listed in the prompt Figure of the paper.

You can download all the testsets from this link. In each json file (corresponding to each QA dataset), there's a list of demos, we basically concatenate those demonstration examples as the prompt.

So for example, if you want to use the multihop expert, then each demonstration should be the question + cot rationale + "Therefore, the final answer is xxx." Then you just append your test question to get the CoT rationale and answer prediction. It's the exact same thing for the math expert, except that we use demo examples from GSM8K rather than HotpotQA, and the explanations (CoT) for each question are from the original dataset.

For the factual expert, the demonstration examples are just the question + answer from NQ, but for each test question, we also prepend the top retrieved passages before the question (you can check the uniqa_predictions_final logs to see the exact formatting. And it's the same thing for the commonsense expert, except that the "passages" are generated by GPT itself rather than retrieved.

I know it sounds a bit confusing, but I'm happy to jump onto a Zoom call if you want to replicate our experiment setup and have additional questions.