Joshua-Stapleton / smartgpt-answers

This repository contains the outputs for two runs of SmartGPT on the MMLU benchmark.
52 stars 3 forks source link

Format #2

Open arthurwolf opened 1 year ago

arthurwolf commented 1 year ago

PDF isn't really a great format for sharing purposes.

If you added a separate folder with .txt or some other simpler-to-parse format I think that'd help anyone trying to use the data.

Cheers.

arnim commented 1 year ago

Your work is awesome 🎉 Having a machine readable version might make it even more re-usable :)

Joshua-Stapleton commented 1 year ago

@arthurwolf thanks for the comment. The purpose of this repo is to highlight our point on the value of human grading and showcase how historic reliance on machine grading has resulted in a large number of errors remaining undetected. We wouldn't want SmartGPT's output to be machine-readable, as the whole point is that we don't want machine-evaluating answers, only humans.

Joshua-Stapleton commented 1 year ago

@arnim thanks for the kind words. I understand the point about reusability, but this is also something we want to be careful with, as simply copying and running machines on large amounts of data is what led to the numerous undetected MMLU errors in the first place. We want to highlight the dangers of mechanical copying and grading - this repo is designed and optimized so that other humans can check our work thoroughly.

tronving commented 1 year ago

@arnim thanks for the kind words. I understand the point about reusability, but this is also something we want to be careful with, as simply copying and running machines on large amounts of data is what led to the numerous undetected MMLU errors in the first place. We want to highlight the dangers of mechanical copying and grading - this repo is designed and optimized so that other humans can check our work thoroughly.

I think it would be more human readable in text form as well. On mobile, it's much easier to read a .txt than .pdf.