[Proposal] Support SAE atttribution patching via path patching

Assuming we implement #5 , we could natively support SAE node attributions without splicing as follows:

The SAE feature act node computes its activation from model act cache (via matmul + ReLU).
The downstream hook node computes its gradient from model grad cache (in the normal way)
dDest / dSrc has to be handled carefully...

Insight: SAE attributions can be found via path patching.

Consider attribution-patching the path: SAE feature act -> SAE output -> Dest w.r.t Metric.
- Because SAE output is a "forward blanket" of SAE feature act, this reduces to attribution-patching SAE act post -> Dest w.r.t Metric
Similarly, consider attribution patching the path Src -> SAE input -> SAE act post.
- Because SAE input is a "backward blanket" of SAE act post, this reduces to attribution-patching the edge Src -> SAE act post w,r,t Metric

How do we do path attribution patching? I think it's just the chain rule applied to EAP.

dtch1997 / sae-eap