Open semicircle opened 11 months ago
Hi,
Thanks for putting this together. Seems practically very useful. Feel free to open a PR if you'd like to integrate this into the library.
Best, Andy
Hi,
Some updates on this:
By adding this ratio to the activation doesn't mean totally stable control.
The coeff
have to revise to adapt the prompt, take 'anger' emotion control as an example, a happy scenario may need larger coeff
than a neutral one to make the response looks anger. It seems to the activation needs to be revised accordingly.
I have noticed the newly added piecewise_linear
operator there, and I am trying to add some code in parallel there to implement this feature.
Thanks~
Currently, after training the
rep_reader
, thecoeff
variable used in the control pipeline need to be customized solely by experiment, and the value changes a lot, take theprimary_emotions
as a example, here's the values I found:This makes it challenging for RepControl to adapt to new models.
My finding is that by introducing the pca_model's
explained_variance_ratio_
into control progress can make the manipulation progress more "gentle" / "accurate".Here's the key modifications: In the rep_readers.py :
Each layer's variance_ratio represents how sparse or variably distributed the direction is, which can be interpreted as a 'confidence' score in the control section for that layer.
So, when manipulating the output, the
activation
variable is calculated as:Applying this method seems to allow all the 7B models I've tested to adapt a common coeff value, approximately around 2.0.
Theoretically, I came up with this idea when I saw the code of WrappedBlock using the controller (activations) to manipulate the tensor using a simple linear approach. So, I just take the variance_ratio into account in a most simple way. Maybe, by extracting the PCA model's underlying singular vector can gain better control over this.
Thanks for sharing this great work!