Port over Stability Evaluation code from Previous RepNoise Project [Dom]

domenicrosati / training-time-domain-authorization

0 stars 1 forks source link

Port over Stability Evaluation code from Previous RepNoise Project [Dom] #3

Open domenicrosati opened 3 months ago

domenicrosati commented 3 months ago

This includes:

Trainability on GEM
Using LM Harness to evaluate model capability
We should also have some Perpelcity Measure on WikiText2 or something as people keep asking about it.

ToDo:

[ ] Add GEM Trainability
[ ] Add LM Harness code
[ ] Add Perplexity Measure

domenicrosati commented 3 months ago

I think another thing that is very valuable is somehow develop metrics on the fluency of the method. LM Harness isn't really great at understanding the impact of fluecny and impact of a method on long form generation.