OpenBioLink / ThoughtSource

A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/
MIT License
887 stars 72 forks source link

Datasets: MedQA, MedMCQA, PubmedQA #68

Closed matthias-samwald closed 1 year ago

matthias-samwald commented 1 year ago

The CoTs for these datasets come from Lievin et al 2022. https://arxiv.org/abs/2207.08143

matthias-samwald commented 1 year ago

Just a minor observation of the MedMCQA source data (not an issue pertaining to our code): in the gold-standard CoTs, certain citations re-appear a lot (e.g. "Ref Harrison20th edition pg 2456" appears over >60 times). I'm pretty sure that some of these citations are not correct, since it appears in a wide variety of contexts.