Closed DavidUdell closed 2 weeks ago
E.g., there is no attn_to_res edge; attn only maps to mlp. Keep only the subset of the 36 possibilities for edges (36 with error terms) that topologically match GPT-2.
res_to_mlp edges actually needed to be preserved for double-counting correction, then skipped at graphing time. Done.
E.g., there is no attn_to_res edge; attn only maps to mlp. Keep only the subset of the 36 possibilities for edges (36 with error terms) that topologically match GPT-2.