EleutherAI / elk

Keeping language models honest by directly eliciting knowledge encoded in their activations.
MIT License
175 stars 32 forks source link

train probe per prompt #271

Open derpyplops opened 1 year ago

derpyplops commented 1 year ago

Solves NOT-291

This is quite a complex change, but this basically aims to train a reporter model per prompt, then evaluate it both on each individual prompt as well as with the mean credence. I should probably add on some tests for the new file structure as well.

the new flag is --probe_per_prompt added on Run

To test you can do like elk elicit gpt2 imdb --num_gpus 2 --probe_per_prompt with and without the flag. elk eval should also work.