The Automated Circuit Discovery (ACDC) library (https://github.com/ArthurConmy/Automatic-Circuit-Discovery) contains code that analyses transformer models, detects which nodes depend on which nodes and graphs the results. This "node dependency chain" information would help a researcher understand a transformer model. Filtering can be extended to cover "time-ordering" dependencies such as "D3.ST4 node depends on D2.ST3 node".
This issue covers:
Become familiar with the ACDC library.
Try to extend it to run against our transformer models (stored in HuggingFace)
Try to extract the "node dependency chain" results to a xxxxx_ACDC.json file in a format similar to our xxxxx_behavior.json file
For each of our HuggingFace model, load the corresponding xxxxx_ACDC.json file to HuggingFace
Extend our CoLab / python code to read the xxxxx_ACDC.json file into memory
Extend our CoLab / python code to graph the xxxxx_ACDC.json data (using Arthurs approach)
Consider whether we can usefully extend our existing quanta maps to include some of this dependency information
Consider how we extend our UsefulNode and Filter classes to cover this new dependency information (This may be the first MLP-neuron-level information we have. If so this is a non-trivial change to the library classes)
At some point contact Arthur and see whether he is still maintaining the project, explain the above, and see if he is interested in helping us do the above
(Arthur's code currently (I believe) uses a locally modified version of Transformer Lens so to fully integrate his and our code base likely involves 1) retrofitting his Transformer Lens changes back into the mainstream Transformer Lens library 2) simplifying the ACDC library to use the (newly improved) mainstream Transformer Lens library 3) importing his (shrunken) library into our library. Only if Arthur is willing to help us in some way should we bring this work into scope, as a separate ticket.)
The Automated Circuit Discovery (ACDC) library (https://github.com/ArthurConmy/Automatic-Circuit-Discovery) contains code that analyses transformer models, detects which nodes depend on which nodes and graphs the results. This "node dependency chain" information would help a researcher understand a transformer model. Filtering can be extended to cover "time-ordering" dependencies such as "D3.ST4 node depends on D2.ST3 node".
This issue covers:
(Arthur's code currently (I believe) uses a locally modified version of Transformer Lens so to fully integrate his and our code base likely involves 1) retrofitting his Transformer Lens changes back into the mainstream Transformer Lens library 2) simplifying the ACDC library to use the (newly improved) mainstream Transformer Lens library 3) importing his (shrunken) library into our library. Only if Arthur is willing to help us in some way should we bring this work into scope, as a separate ticket.)