Eval results aren't matching the paper

epfLLM / meditron

Meditron is a suite of open-source medical Large Language Models (LLMs).

https://huggingface.co/epfl-llm

Apache License 2.0

1.77k stars 159 forks source link

Eval results aren't matching the paper #29

Open sia-cerebras opened 5 months ago

sia-cerebras commented 5 months ago

I'm not able to match the 3-shot eval results reported in the paper for the pretrained model. I downloaded the Meditron-7b model from HF. For example, for MedQA I get 0.353, while the paper reports 0.287±0.008 My command was: ./inference_pipeline.sh -b medqa4 -c meditron-7b -s 3 -m 0 -out_dir out_dir

On PubMedQA, I got 0.486, but the paper reports .693±.151.

jfernandrezj commented 4 months ago

Just checking if anybody is interested, I run my own eval and it is worse than Mistral:7b on TNM coding (by far)!