callummcdougall / sae_vis

Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
MIT License
161 stars 34 forks source link

Prompt-centric vis gone wild. #53

Open Pe4enkazAMI opened 4 months ago

Pe4enkazAMI commented 4 months ago

Hi, i have a question regarding the work of the prompt-centric visualiser, it seems that there is some issues with the code... or maybe I am doing it wrong.

Here it is:

prompt = <Some long text> 
filename = "_prompt_vis_demo.html"
sae_vis_data.save_prompt_centric_vis(
    prompt = prompt,
    filename = filename
)

I use the code above for visualisation. However, no matter the model I use and prompt I evaluate it always returns the following error:

AssertionError: Key not found in scores_dict.keys()=dict_keys([]). This means that there are no features with a nontrivial score for this choice of key & metric.

Even though when I use a pretrained model from here SAELens demo

I thought it happens because my SAE is too sparse, though I did not seem like that, the last time I checked. I would really appreciate if you could at least lead me somewhere with this issue. Thanks in advance.

callummcdougall commented 4 months ago

Thanks for flagging this, I'll attempt to have a look at it later this week. Is this directly running code from the current version of the demo notebook / Colab, and if not then could you send me a Colab which recreates the exact error? Thanks!

Pe4enkazAMI commented 4 months ago

Hi again! Here is the Colab with the error. Thanks!

callummcdougall commented 4 months ago

Hmm, I'm hitting an error earlier than the last cell. Is this a versioning thing?

image

Pe4enkazAMI commented 4 months ago

Yes, sorry for inconvenience. I have updated the notebook, now it should work

callummcdougall commented 4 months ago

Hey,

I've reproduced the error now. I had to it now (also needed to pip install transformerlens, as well as some more code to download the same tokens data I used in my demo notebook). Trying the example with these steps taken, it seems to work (although maybe you used a different dataset, I can't tell because the token_dataset object that you defined isn't in the Colab).

The specific example here is quite different from the demo colab - it uses a different prompt, and also a different layer I believe. Looking at the error message reveals the problem. The error message is:

AssertionError: Key act_quantile|'Mary' (0) not found in scores_dict.keys()=dict_keys(["act_size|'Ġkills' (1)", "act_quantile|'Ġkills' (1)", "loss_effect|'ĠJoe' (2)"]).

which shows that the scores dict isn't empty, i.e. some tokens have active features (specifically tokens 1 & 2 are active on certain features). Passing the arguments:

seq_pos = 1,
metric = "act_quantile",

into this function (so our default view is a valid one) shows us that feature 6 has an effect on the output. However, I do think that this behaviour is pretty suboptimal (the first key should be replaced with a valid one if it's in fact not valid), so I've made this change to the library now. Thanks for flagging this!

Best, Callum

On Thu, 4 Jul 2024 at 13:08, Ian Maksimov @.***> wrote:

Yes, sorry for inconvenience. I have updated the notebook, now it should work

— Reply to this email directly, view it on GitHub https://github.com/callummcdougall/sae_vis/issues/53#issuecomment-2208806004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKZEROQ7WSQM4Y3W3LKBC2DZKU3M3AVCNFSM6AAAAABKHFTXWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBYHAYDMMBQGQ . You are receiving this because you commented.Message ID: @.***>

Pe4enkazAMI commented 4 months ago

Hi! Sorry for invalid notebook I have overlooked dataset loading, anyway I used "NeelNanda/pile-10k" from demo notebook, I guess it does not matter as I had the same error on other datasets too. The same holds for different prompts, none worked.

Anyway, thanks for help!

Any thoughts why does this error occurs? Is this a direct consequence of SAE being too sparse?

Also, I should mention that I used e2e_sae repo for training my SAE, it is slighltly different from the SAELens training. When I switched back to SAELens prompt-centric vis worked.

Kind Regards, Ian

callummcdougall commented 4 months ago

It's not that the SAE is too sparse, SAEs are meant to be sparse. It's just the fact that if you have a feature which doesn't fire on any of the inputs in your batch, then you'll get an error if you try to look at that feature's dashboard. As mentioned in the previous email, that's now been fixed (currently it will default to opening the vis on the first non-zero feature).

On Fri, 12 Jul 2024 at 12:54, Ian Maksimov @.***> wrote:

Hi! Sorry for invalid notebook I have overlooked dataset loading, anyway I used "NeelNanda/pile-10k" from demo notebook, I guess it does not matter as I had the same error on other datasets too. The same holds for different prompts, none worked.

Anyway, thanks for help!

Any thoughts why does this error occurs? Is this a direct consequence of SAE being too sparse?

Also, I should mention that I used e2e_sae https://github.com/ApolloResearch/e2e_sae repo for training my SAE, it is slighltly different from the SAELens training. When I switched back to SAELens prompt-centric vis worked.

Kind Regards, Ian

— Reply to this email directly, view it on GitHub https://github.com/callummcdougall/sae_vis/issues/53#issuecomment-2225425603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKZERORNO4SSKQE3YKGT7LTZL67ZBAVCNFSM6AAAAABKHFTXWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRVGQZDKNRQGM . You are receiving this because you commented.Message ID: @.***>

Pe4enkazAMI commented 4 months ago

Ok, got it! Thank you!