IBM / Dromedary

Dromedary: towards helpful, ethical and reliable LLMs.
GNU General Public License v3.0
1.11k stars 86 forks source link

evaluation code #9

Closed Wangpeiyi9979 closed 1 year ago

Wangpeiyi9979 commented 1 year ago

Hi, thanks for your excellent work.

Could you please share your evaluation code on truthful QA and BBH?

Edward-Sun commented 1 year ago

Hi Peiyi,

Thanks for waiting! We have just released the multichoice evaluation code here.

As for generation evaluation, we use the same inference code released here.

We use the standard prompt for TruthfulQA generation evaluation, i.e.,

[standard_prompt]
### User
[TruthfulQA question]

### Dromedary

as described in our paper.

As for BBH, I don't think we reported any systematic evaluation in our paper. But since the inference code is provided, you can try some examples in with our ChatBot demo.