Open arthurwolf opened 1 year ago
Your work is awesome 🎉 Having a machine readable version might make it even more re-usable :)
@arthurwolf thanks for the comment. The purpose of this repo is to highlight our point on the value of human grading and showcase how historic reliance on machine grading has resulted in a large number of errors remaining undetected. We wouldn't want SmartGPT's output to be machine-readable, as the whole point is that we don't want machine-evaluating answers, only humans.
@arnim thanks for the kind words. I understand the point about reusability, but this is also something we want to be careful with, as simply copying and running machines on large amounts of data is what led to the numerous undetected MMLU errors in the first place. We want to highlight the dangers of mechanical copying and grading - this repo is designed and optimized so that other humans can check our work thoroughly.
@arnim thanks for the kind words. I understand the point about reusability, but this is also something we want to be careful with, as simply copying and running machines on large amounts of data is what led to the numerous undetected MMLU errors in the first place. We want to highlight the dangers of mechanical copying and grading - this repo is designed and optimized so that other humans can check our work thoroughly.
I think it would be more human readable in text form as well. On mobile, it's much easier to read a .txt than .pdf.
PDF isn't really a great format for sharing purposes.
If you added a separate folder with .txt or some other simpler-to-parse format I think that'd help anyone trying to use the data.
Cheers.