EleutherAI / elk

Keeping language models honest by directly eliciting knowledge encoded in their activations.
MIT License
175 stars 32 forks source link

Fix seed not exposed #270

Closed AugustasMacijauskas closed 1 year ago

AugustasMacijauskas commented 1 year ago

Before the fix, running

elk elicit gpt2 imdb --data.seed=0 --disable_cache

and

elk elicit gpt2 imdb --data.seed=1 --disable_cache

would result in identical results. Moreover, printing the seed on the first line of the load_prompts function would print 42 in both cases which is the default value. https://github.com/EleutherAI/elk/blob/a88c01a07672321e9cc0ac8d32d702e3437deb5c/elk/extraction/prompt_loading.py#L16

After the fix, running the above commands leads to different results (because different data points are sampled from the original training sets) and the print statement now correctly outputs 0 and 1, respectively.