0-5788719150923125 / praxis

as above, so below
https://src.eco
MIT License
2 stars 1 forks source link

Integrate an evaluation harness #12

Open Vectorrent opened 1 month ago

Vectorrent commented 1 month ago

We will need to test our models against common, industry-standard benchmarks. Pythia is what everyone uses today: https://github.com/EleutherAI/lm-evaluation-harness

The process will involve:

Vectorrent commented 1 month ago

I added an eval.py script, which covers most of this work, but it doesn't seem to work right. For some reason, eval suites tend to fail almost immediately, with weird tokenization errors. I'm not sure if that's because of a poorly-trained tokenizer, or an under-trained model, or because of the custom architecture - and I'm not sure how to fix it, right now.

Vectorrent commented 13 hours ago

I was not aware of the evaluate library. This looks pretty nice, and since we use the Huggingface Transformers API already - would probably be easy to setup. We should try this one.