Multilayer graphing attributes 0.0 explained and 0.0 unexplained to some earlier layers

DavidUdell / sparse_circuit_discovery

Circuit discovery in GPT-2 small, using sparse autoencoding

MIT License

7 stars 1 forks source link

Multilayer graphing attributes 0.0 explained and 0.0 unexplained to some earlier layers #114

Closed DavidUdell closed 1 month ago

DavidUdell commented 2 months ago

It's a vanishing grads issue, I'm fairly sure. Like, the autoencoders are thinning out the grads excessively when far enough back? And zero grad tensors at any point will totally wipe out reasonable acts values, of course.

DavidUdell commented 1 month ago

I am torn on whether this is actually a vanishing grads issue or a true observation in the data. A couple of fixing that should have worked for vanishing grads--moving everything to float 64s, loss scaling--didn't change results. But there may be an implementation level thing in autograd that I'm not fully understanding. For now, I have a quick patch to prevent fatal crashes in this case, working on the assumption that this is observation and not a bug.