PhilipQuirke / quanta_maths

Tool used to verify accuracy of transformer model
Apache License 2.0
1 stars 1 forks source link

Enhance Maths 99.9999% code to ablate all non-useful nodes #26

Open PhilipQuirke opened 7 months ago

PhilipQuirke commented 7 months ago

Read the paper "Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms" https://arxiv.org/pdf/2403.17806.pdf They argue that "A circuit is faithful if all model edges outside the circuit can be ablated without changing the model’s performance on the task; faithfulness is what justifies studying circuits, rather than the full model."

This ticket covers enhancing the existing "Maths 99.9999% test" code to ablate all nodes that are considered "not useful" as it does the 1M predictions. This will strengthen the empirical evidence from a successful run of this code from "evidence of model accuracy" to "evidence of model accuracy and useful node list accuracy".

Once this enhancement is working, run it on all existing accurate HuggingFace models (add_d5_l2_h3_t15K, add_d6_l2_h3_t15K and ins1_mix_d6_l3_h4_t40K at time of writing). If any of these models show prediction failures, create a new ticket to investigate