dtch1997 / sae-eap

Edge attribution patching with SAEs
0 stars 0 forks source link

[Proposal] Support SAE atttribution patching via path patching #7

Open dtch1997 opened 1 week ago

dtch1997 commented 1 week ago

Assuming we implement #5 , we could natively support SAE node attributions without splicing as follows:

Insight: SAE attributions can be found via path patching.

How do we do path attribution patching? I think it's just the chain rule applied to EAP.