issues
search
DavidUdell
/
sparse_circuit_discovery
Circuit discovery in GPT-2 small, using sparse autoencoding
MIT License
6
stars
1
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
In `validate_circuits.py`, replicate the top-examples of `cognition_graph_webtext.py`
#70
DavidUdell
closed
6 months ago
0
Plot with `t.topk()`, rather than `np.random.choice()`?
#69
DavidUdell
closed
6 months ago
0
Feature/chat
#68
DavidUdell
closed
6 months ago
0
Implement MAX_SEQ_PER_DIM hyperparameter to address inference slowdowns
#67
DavidUdell
closed
6 months ago
0
Rename to `cognition_graph_<dataset>`
#66
DavidUdell
closed
6 months ago
0
Plausibly, I should just precompute interp data for _all_ model layers, and commit those .csv files to GitHub
#65
DavidUdell
closed
6 months ago
0
bleach.clean() is too aggressively sanitizing
#64
DavidUdell
closed
6 months ago
1
Round negligible feature activation floats to 0.0
#63
DavidUdell
closed
6 months ago
0
Update Readme
#62
DavidUdell
closed
6 months ago
2
Print out validation datapoints
#61
DavidUdell
closed
6 months ago
0
Try plotting only logits most negatively affected, during ablations.
#60
DavidUdell
closed
6 months ago
0
Make hook ablations more surgical in each sequence?
#59
DavidUdell
closed
6 months ago
0
Strengthen hook residual for `output[0]`
#58
DavidUdell
closed
7 months ago
0
Restore `top_tokens.py`--it's too hard right now to guess what features do
#57
DavidUdell
closed
6 months ago
1
Write logits comparison in `chat.py`
#56
DavidUdell
closed
7 months ago
2
Update showcase `header.png` image
#55
DavidUdell
closed
6 months ago
0
Bump semver version for the post
#54
DavidUdell
closed
7 months ago
0
Optimize for (almost) linear approximations of transformer layers directly
#53
DavidUdell
closed
7 months ago
0
Restore arbitrary autoencoder-training functionality, in the new untied setup
#52
DavidUdell
closed
7 months ago
0
I think HTML token highlights might be off by +1 positions to the right
#51
DavidUdell
closed
7 months ago
0
Add Joseph's license in his directory
#50
DavidUdell
closed
7 months ago
0
Bugfix/tests
#49
DavidUdell
closed
7 months ago
0
Fix smoke test
#48
DavidUdell
closed
7 months ago
0
Feature/runs
#47
DavidUdell
closed
7 months ago
0
Selective ablations at known meaningful features
#46
DavidUdell
closed
7 months ago
0
Reduce RAM load in `contexts.py`
#45
DavidUdell
closed
7 months ago
1
Ablation hook residuals
#44
DavidUdell
closed
7 months ago
2
Restore interpretable autoencoder `.csv`s
#43
DavidUdell
closed
7 months ago
1
Pull down @JosephBloom SAEs for GPT-2
#42
DavidUdell
closed
7 months ago
0
Implement Anthropic-style resampling
#41
DavidUdell
closed
7 months ago
1
Circuit-level validation
#40
DavidUdell
closed
6 months ago
2
Add one to the upper view slice in `contexts.py`
#39
DavidUdell
closed
7 months ago
0
Larger fonts for neuron label appendables
#38
DavidUdell
closed
7 months ago
0
Final `header.png`
#37
DavidUdell
closed
7 months ago
1
Implement a learning rate warmup
#36
DavidUdell
closed
7 months ago
1
Top contexts labels
#35
DavidUdell
closed
7 months ago
2
Rank nodes in graphs by layer index
#34
DavidUdell
closed
7 months ago
1
directed_graph_mc specifically is failing smoke tests.
#33
DavidUdell
closed
8 months ago
0
Branching factor isn't working during autographing.
#32
DavidUdell
closed
8 months ago
0
Autofiltering to only the affected features isn't working as expected.
#31
DavidUdell
closed
8 months ago
0
Frac features not plotted breaks in recursive plotting
#30
DavidUdell
closed
8 months ago
1
Top logit-effect labels
#29
DavidUdell
closed
7 months ago
0
Layer indexed DIMS_PLOTTED_LIST
#28
DavidUdell
closed
8 months ago
0
Add COEFFICIENT YAML constant.
#27
DavidUdell
closed
8 months ago
0
Spin off feature_web_rasp.
#26
DavidUdell
closed
8 months ago
0
Add a THINNING_FACTOR YAML constant for full-scale graphing.
#25
DavidUdell
closed
8 months ago
0
NoneType error in the .dot file AGraph caching process, somewhere.
#24
DavidUdell
closed
8 months ago
0
Feature/logging
#23
DavidUdell
closed
8 months ago
0
`wandb` logging
#22
DavidUdell
closed
8 months ago
0
Feature/datapoints
#21
DavidUdell
closed
8 months ago
0
Previous
Next