PhilipQuirke / verified_transformers

Tool used to verify accuracy of transformer model
Apache License 2.0
1 stars 1 forks source link

MATH: For ADD, SUB, NEG how is SC, MB and NB sub-task output structured #29

Open PhilipQuirke opened 5 months ago

PhilipQuirke commented 5 months ago

Read https://github.com/PhilipQuirke/verified_transformers/blob/main/mixed_model.md

In the ins1_mix_d6_l3_h4_t40K model, the attention head P18L0H0 performs the SC, MB and NB sub-tasks depending on the input question type. That is, the output is poly-semantic. How is the output structured so that later nodes can select either SC output or the MB output or the NB output? Investigate.

PCA the output of P18L0H0 for ADD, SUB, NEG questions with no SC/MB/NB features. PCA the output of P18L0H0 for ADD, SUB, NEG questions with a SC/MB/NB in answer digit A1 only. Do this for ADD, SUB, NEG separately and then for a batch of questions with all three question types. Compare the PCAs. What can we say about how the output of this node (especially as it relates to later nodes selecting either SC output or the MB output or the NB output)?

(The Colab VerifiedArithmeticAnalyse.ipynb part 19B has a clause for the ins1_mix_d6_l3_h4_t40K model listing the interesting nodes for the ADD case in terms of their PCA results. It is an example of running PCA against groups of questions.)

Useful diagrams related to this are:

amirabdullah19852020 commented 4 months ago

@PhilipQuirke - just to confirm, where you say:

PCA the output of P18L0H0 for ADD, SUB, NEG questions with no SC/MB/NB features.
PCA the output of P18L0H0 for ADD, SUB, NEG questions with a SC/MB/NB in answer digit A1 only. Do this for ADD, SUB, NEG separately and then for a batch of questions with all three question types.

These are not the cfg.tricase_questions_dict, right? I'm going to have to update maths_test_questions to make both these groups of questions above and then run the PCA (or other analysis on attention outputs). (At least that's my understanding.)

PhilipQuirke commented 4 months ago

The below code indexes the tricase_questions_dict with MathsToken.PLUS/MINUS.

    for operation in [MathsToken.PLUS, MathsToken.MINUS]:
        t_questions = make_maths_tricase_questions_core(cfg, answer_digit, operation)
        # Use a tuple of (answer_digit, operation) as the key for indexing
        cfg.tricase_questions_dict[(answer_digit, operation)] = t_questions

This is effectively an index on ADD/SUB. There are no NEG questions stored at the moment. I think we should improve the make_maths_tricase_questions_core code to generate questions in all 3 cases (currently only handles ADD/SUB), update the tricase_questions_dict index to all 3 cases, and any code that references the tricase_questions_dict.

Hopefully that is clear.

amirabdullah19852020 commented 4 months ago

The below code indexes the tricase_questions_dict with MathsToken.PLUS/MINUS.

    for operation in [MathsToken.PLUS, MathsToken.MINUS]:
        t_questions = make_maths_tricase_questions_core(cfg, answer_digit, operation)
        # Use a tuple of (answer_digit, operation) as the key for indexing
        cfg.tricase_questions_dict[(answer_digit, operation)] = t_questions

This is effectively an index on ADD/SUB. There are no NEG questions stored at the moment. I think we should improve the make_maths_tricase_questions_core code to generate questions in all 3 cases (currently only handles ADD/SUB), update the tricase_questions_dict index to all 3 cases, and any code that references the tricase_questions_dict.

Hopefully that is clear.

Yes, although I'll do it so that by default it returns ADD and SUB cases unless the arguments change. (Mostly, don't want to have to update VerifiedArithmeticAnalyse right now, want to stick to a separate notebook.)

Note for self (and these should be pasted in each github issue):

ADD, SUB and NEG are addition, subtraction(positive answer) and subtraction(negative answer) SC: Is DN + DN' >= 10 (boolean operation) TRICASE: DN + DN' <=8 (never carry), =9 (carry iff you have a carry coming in elsewhere), >=10 (always carry). MB/NB: Borrow one in positive and negative examples respectively. (Is Dn - Dn' < 0) MT, NT: (Analog of tricase for positive and negative examples respectively)

amirabdullah19852020 commented 4 months ago

The below code indexes the tricase_questions_dict with MathsToken.PLUS/MINUS.

    for operation in [MathsToken.PLUS, MathsToken.MINUS]:
        t_questions = make_maths_tricase_questions_core(cfg, answer_digit, operation)
        # Use a tuple of (answer_digit, operation) as the key for indexing
        cfg.tricase_questions_dict[(answer_digit, operation)] = t_questions

This is effectively an index on ADD/SUB. There are no NEG questions stored at the moment. I think we should improve the make_maths_tricase_questions_core code to generate questions in all 3 cases (currently only handles ADD/SUB), update the tricase_questions_dict index to all 3 cases, and any code that references the tricase_questions_dict.

Hopefully that is clear.

For Math_SUB, is it true that only cases 9 and 10 are supported? Since case 8 leads to x - y < 0 which might lead to a negative answer. Or should we try to mitigate by placing more significant digits above the test tricase digit?

PhilipQuirke commented 4 months ago

Re For Math_SUB, is it true that only cases 9 and 10 are supported? Currently the code uses T8, T9 and T10 for both ADD and SUB. This is confusing.

For addition, we should rename them as: ST8 : Dn + D'n <= 8 and so never causes a carry ST9: Dn + D'n = 9 and may cause a carry (if the next lower digit pair causes a carry) ST10 : Dn + D'n >= 10 and so always causes a carry

The subtraction we should rename them as: MT1 : Dn - D'n >= 1 and so never causes a borrow one MT0: Dn - D'n = 0 and may cause a borrow one (if the next lower digit pair causes a borrow one) MT-1 : Dn - D'n <= -1 and so always causes a borrow one I believe that the make_maths_tricase_questions_core code implements these states but uses confusing names for them.

So in both cases there are 3 interesting states.

I don't think we need to differentiate between positive-answer subtraction and negative-answer subtraction, because when the model is doing the above calculations as sub-tasks it is calculating whether D < D' which is equally interesting to both the positive-answer subtraction and negative-answer subtraction circuits.