MATH: Is a 99.9999% accurate 1-layer addition model possible?

Our 2-layer addition model achieved 99.9999% accuracy using the TriCase/TriAdd approach. Our 1-layer addition model only achieved 99% accuracy. It never learnt the TriCase/TriAdd approach. But the 2-layer model only uses the attention heads in one layer. Perhaps 1-layer addition model cant learn TriCase/TriAdd but perhaps the accurate 2-layer model can be pruned and retrained to give an accurate 1-layer addition model.

This ticket covers:

Migrate the existing "insert mode 1, 2 & 3" code from the VerifiedArithmeticTrain Colab to a new file train.py in the python library
Strengthen the code to handle inserting a 2-layer model into a 1-layer model (just prune off the 2-layer data).
Use VerifiedArithmeticTrain to train a new model "ins1_add_d6_l1_h3_t15K_s572091" in the normal way - inserting the (pruned) model "add_d6_l2_h3_t15K_s372001"
In this ticket record the AvgFinalLoss and FinalLoss of the new model (as calculated by the Colab).
From the Colab temporary files, download the new model files ins1_add_d6_l1_h3_t15K_s572091.pth and ins1_add_d6_l1_h3_t15K_s572091_train.json. Give these files to PQ to load to https://huggingface.co/PhilipQuirke/VerifiedArithmetic/
Use VerifiedArithmeticAnalyse to run the "99.9999% accuracy" test.
In this ticket record the results (as calculated by the Colab).

PhilipQuirke / quanta_maths

MATH: Is a 99.9999% accurate 1-layer addition model possible? #37