TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.17k stars 241 forks source link

[Proposal] Remove the overhead caused by full_hook.__name__ = (hook.__repr__())? #631

Open verlocks opened 3 weeks ago

verlocks commented 3 weeks ago

Proposal

Remove the full_hook._name_ = (hook._repr_()) line in HookPoint.add_hook, if this is not really necessary.

Motivation

I am developing a forward hook recently, and use functools.partial. When I run the code I find it is very slow. After a long time of dig up, I realized that because I have a Dict[device, tensor] store some tensors on several device (for not need to copy tensor between devices when running hooks) as a parameter in functools.partial, when running hook._repr_() it will query the _repr_ of Dict and then query the tensors inside Dict, which causes a long overhead when hook._repr_() run a lot of times. I spent a long time and found this problem and maybe it is better to just remove it if not necessary? I am not sure here, maybe I just shouldn't write code in this way, what do you think?

Pitch

Remove full_hook._name_ = (hook._repr_()) if not necessary.

Alternatives

Avoid using hook._repr_().

Checklist

bryce13950 commented 3 weeks ago

If you have time to submit a PR for this, I would be happy to accept a change that gives an option to add_hook to skipping the name. Maybe something like skip_verbose_naming, and if it is true, then simply skip that line. We cannot remove it, or make skipping it the default behavior, as I am sure many people are using this. Give an option to skip it seems acceptable to me though.