EleutherAI / sae-auto-interp

https://blog.eleuther.ai/autointerp/
Apache License 2.0
97 stars 11 forks source link

[Improvement] get_activating_examples is going to be a bottleneck #3

Closed SrGonao closed 3 months ago

SrGonao commented 4 months ago

This function is going to be a bottleneck (I had something similar in my code). https://github.com/EleutherAI/sae-auto-interp/blob/9751cb25f22824ec544d2718a3bc4a8e246c326f/sae_auto_interp/features/features.py#L210

I'm not sure about the "l_ctx" and "r_ctx" part, but looping over unique examples is very slow. I made this function that makes a list of all the unique sentences: https://github.com/EleutherAI/sae-auto-interp/blob/9751cb25f22824ec544d2718a3bc4a8e246c326f/sae_auto_interp/features/features.py#L152

I think it is potentially better to save the trimming of the sentences to somewhere else in the code. I feel like this should be something we want to experiment with