evaluation code - Githubissues

Hi Peiyi,

Thanks for waiting! We have just released the multichoice evaluation code here.

As for generation evaluation, we use the same inference code released here.

We use the standard prompt for TruthfulQA generation evaluation, i.e.,

[standard_prompt]
### User
[TruthfulQA question]

### Dromedary

as described in our paper.

As for BBH, I don't think we reported any systematic evaluation in our paper. But since the inference code is provided, you can try some examples in with our ChatBot demo.

IBM / Dromedary

evaluation code #9