Right now, the library considers MLP as atomic blocks - which is fine. However, with SAE, we can decompose these into a smaller number of interpretable features. The SAE approach is a little sensitive to hyperparameters, but would still be an interesting integration.
Agreed. The ACDC library also has node-connection-information at attention head and MLP-neuron level - another reason to move to finer granularity (over time).
Right now, the library considers MLP as atomic blocks - which is fine. However, with SAE, we can decompose these into a smaller number of interpretable features. The SAE approach is a little sensitive to hyperparameters, but would still be an interesting integration.