andyzoujm / representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency
https://www.ai-transparency.org/
MIT License
716 stars 86 forks source link

n_difference parameter with clustermean? #17

Closed vthost closed 11 months ago

vthost commented 12 months ago

Hi, Thank you for making the code directly available! I have a question about the clustermean method. The code (repe.rep_reading_pipeline, l. 106) contains the below statement which I think is not effective since it's 'cluster_mean'. However, independently of that, since I am trying to just understand, shouldn't it be n_difference == 0?

if direction_method == 'clustermean':
 assert n_difference == 1, "n_difference must be 1 for clustermean"

Thank you so much already!

andyzoujm commented 11 months ago

Sorry for missing this issue - oops. I think this was a random design choice. They could work under different assumptions.