andyzoujm / representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency
https://www.ai-transparency.org/
MIT License
716 stars 86 forks source link

Question about layer_id #10

Closed sorcererrandy closed 1 year ago

sorcererrandy commented 1 year ago

Hi! When I run your notebook, I wonder how can define layer_id for emotion control?

Looking forward to hear your answers, thank you!

justinphan3110 commented 1 year ago

Hi @sorcererrandy , layer_id and coeff are tuned parameters. In emotion_concept.ipynb and emotion_function.ipynb. We choose layer_id = list(range(-11, -30, -1)) for 13B Llama2-chat and list(range(-5, -18, -1)) for 7B Mistral/Llama2-chat

sorcererrandy commented 1 year ago

Thanks for your answer. I have a further question: how are the layer_id and coeff adjusted? Should I adjust the layer_id based on the accuracy after training the rep_reader, or should I adjust it based on the text output from the model?

andyzoujm commented 1 year ago

We usually use the set of layers that correspond to high reading accuracy, but feel free to also adjust it and coeff based on the output and your usecase.

sorcererrandy commented 1 year ago

I understand. Thanks!

shiqichen17 commented 10 months ago

Hello! Could you please clarify the meaning of the "-" symbol in this context? I'm asking because I noticed its use in the code snippet from honesty_control_TQA.ipynb, there is:layer_ids = np.arange(8, 32, 3). Could you explain why there is a difference here?

zhoushang2003 commented 9 months ago

Could @justinphan3110 kindly explain how to define the layer_id for 70B Llama2-chat in emotion_concept and emotion_function?

shiqichen17 commented 9 months ago

I've received your e-mail , and I'll reply soon:)Shiqi Chen