callummcdougall / sae_vis

Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
MIT License
158 stars 32 forks source link

Need support for gated SAE #54

Open yangjingyuan opened 4 months ago

yangjingyuan commented 4 months ago

AssertionError: If encoder isn't an AutoEncoder, it should have weights 'W_enc', 'W_dec', 'b_enc', 'b_dec'

Gated SAE do not have b_enc and it seems AutoEncoder is not suitable for gated SAE.

callummcdougall commented 4 months ago

Noting that there's a fork of this repo which is used for sae-lens, and if most people will be working with sae-lens going forwards, then I think the probability of building in support for gated SAEs here is somewhat low. I'd be updated if there are a lot of people using sae-vis but not sae-lens though (although caveat that I'm mostly too busy to keep contributing to this library in major ways, so probably someone else would be implementing gated SAEs if it does end up happening here)

yangjingyuan commented 4 months ago

sae-vis is a great visualization tool. It might be convenient if auto-encoder(sae-vis) is the type of SAE(sae-lens) or HookedRootModule(transformer-lens), then their updates might be relatively easy to use.

callummcdougall commented 4 months ago

Makes sense, although I would guess there's also an advantage to supporting general non-SAE-lens architectures? Unless it's easy enough to wrap everything in an SAE from sae-lens that it's fine to assume it's sae-lens type?

jbloomAus commented 4 months ago

https://github.com/jbloomAus/SAEDashboard supports any SAE Lens SAE. Planning to promote more widely shortly.

On Fri, Jul 12, 2024, 11:43 AM Callum McDougall @.***> wrote:

Makes sense, although I would guess there's also an advantage to supporting general non-SAE-lens architectures? Unless it's easy enough to wrap everything in an SAE from sae-lens that it's fine to assume it's sae-lens type?

— Reply to this email directly, view it on GitHub https://github.com/callummcdougall/sae_vis/issues/54#issuecomment-2225308301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPMYZYRXFU33K3NFRTBVATZL6XNXAVCNFSM6AAAAABKYNAQFCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRVGMYDQMZQGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yangjingyuan commented 4 months ago

I see. It's better to have a united mechanical interpretability tool(combine transformer-lens, sae-lens, sae-vis,...) that can do circuit discovery, SAE, visualization, activation steering, etc., which can use multiple nodes for multiple GPUs for large LLMs.