expln / metamath-lamp

Metamath-lamp (Lite Assistant for Metamath Proofs) is a GUI-based proof assistant for creating formal mathematical proofs in Metamath that does not require installation (just run it directly using your web browser).
https://expln.github.io/lamp/latest/index.html
MIT License
12 stars 5 forks source link

Unclear why bottom-up prover is missing a possible proof (the one I've been using) #127

Closed david-a-wheeler closed 1 year ago

david-a-wheeler commented 1 year ago

I'm mystified by a behavior of metamath-lamp. Its bottom-up prover found some proofs, but not the one I've been using. What am I missing?

Here's the background. In the lamp-guide I present this as a proof of 2p2e4, which of course works. It uses addassi and eventually 3eqtr4ri in its justifications:

{"srcs":[{"typ":"Web","fileName":"","url":"https://us.metamath.org/metamath/set.mm","readInstr":"StopBefore","label":"2p2e4","resetNestingLevel":true,"allLabels":[]}],"descr":"Prove that 2 + 2 = 4.","varsText":"","disjText":"","stmts":[{"label":"1","typ":"p","isGoal":false,"cont":"|- 1 e. CC","jstfText":": ax-1cn"},{"label":"2","typ":"p","isGoal":false,"cont":"|- 2 e. CC","jstfText":": 2cn"},{"label":"3","typ":"p","isGoal":false,"cont":"|- 2 = ( 1 + 1 )","jstfText":": df-2"},{"label":"4","typ":"p","isGoal":false,"cont":"|- ( 2 + 2 ) = ( 2 + ( 1 + 1 ) )","jstfText":"3 : oveq2i"},{"label":"5","typ":"p","isGoal":false,"cont":"|- 3 = ( 2 + 1 )","jstfText":": df-3"},{"label":"6","typ":"p","isGoal":false,"cont":"|- 4 = ( 3 + 1 )","jstfText":": df-4"},{"label":"7","typ":"p","isGoal":false,"cont":"|- ( 3 + 1 ) = ( ( 2 + 1 ) + 1 )","jstfText":"5 : oveq1i"},{"label":"8","typ":"p","isGoal":false,"cont":"|- 4 = ( ( 2 + 1 ) + 1 )","jstfText":"6 7 : eqtri"},{"label":"9","typ":"p","isGoal":false,"cont":"|- ( ( 2 + 1 ) + 1 ) = ( 2 + ( 1 + 1 ) )","jstfText":"2 1 1 : addassi"},{"label":"2p2e4","typ":"p","isGoal":true,"cont":"|- ( 2 + 2 ) = 4","jstfText":"9 8 4 : 3eqtr4ri"}]}

However, when I try to more aggressively automate finding the proof, the tool finds other longer proofs but somehow misses this version of the proof. If I delete the steps 1, 2, and 9 (|- 1 e. CC and |- 2 e. CC and |- ( ( 2 + 1 ) + 1 ) = ( 2 + ( 1 + 1 ) )) and remove the justification for the goal, I end up with this proof state:

{"srcs":[{"typ":"Web","fileName":"","url":"https://us.metamath.org/metamath/set.mm","readInstr":"StopBefore","label":"2p2e4","resetNestingLevel":true,"allLabels":[]}],"descr":"Prove that 2 + 2 = 4.","varsText":"","disjText":"","stmts":[{"label":"3","typ":"p","isGoal":false,"cont":"|- 2 = ( 1 + 1 )","jstfText":": df-2"},{"label":"4","typ":"p","isGoal":false,"cont":"|- ( 2 + 2 ) = ( 2 + ( 1 + 1 ) )","jstfText":"3 : oveq2i"},{"label":"5","typ":"p","isGoal":false,"cont":"|- 3 = ( 2 + 1 )","jstfText":": df-3"},{"label":"6","typ":"p","isGoal":false,"cont":"|- 4 = ( 3 + 1 )","jstfText":": df-4"},{"label":"7","typ":"p","isGoal":false,"cont":"|- ( 3 + 1 ) = ( ( 2 + 1 ) + 1 )","jstfText":"5 : oveq1i"},{"label":"8","typ":"p","isGoal":false,"cont":"|- 4 = ( ( 2 + 1 ) + 1 )","jstfText":"6 7 : eqtri"},{"label":"2p2e4","typ":"p","isGoal":false,"cont":"|- ( 2 + 2 ) = 4","jstfText":""}]}

I select the goal and click on "unify" to enter the bottom-up prover. I disable "Allow new disjoints" and "Allow new variables", and leave enabled "Allow new steps". I set "Statement length restriction to "Unrestricted", and the Allowed statements for both the first level and other levels is set to exactly steps 4 and 8. Search depth stays at 4. I then click on "Prove".

There are 3 pages of proofs, all of which work. E.g.:

1 [ : eqid ]    ✓   |- ( 2 + 2 ) = ( 2 + 2 )
2   [ : 2cn ]   ✓   |- 2 e. CC
9   [ : ax-1cn ]    ✓   |- 1 e. CC
10  [2 9 9 : addassi ]  ✓   |- ( ( 2 + 1 ) + 1 ) = ( 2 + ( 1 + 1 ) )
11  [4 1 10 : 3eqtr4ri ]    ✓   |- ( ( 2 + 1 ) + 1 ) = ( 2 + 2 )
2p2e4   [8 11 : eqtr2i ]    ✓   |- ( 2 + 2 ) = 4

However, the proof I use in my demonstration only adds 3 steps (for addassi, ax-1cn, and 2cn). It's not clear to me why the bottom-up prover can't also detect this case in its search. Is this another example where the lack of full unification is missing a case? A bug? Is it something else? Thanks!

david-a-wheeler commented 1 year ago

Here's more information. When I do a bottom-up search the same way, but expressly require that the root be 3eqtr4ri, the only option presented is:

1   [ : eqid ]  ✓   |- 4 = 4
2   [ : eqid ]  ✓   |- ( 2 + 2 ) = ( 2 + 2 )
9   [ : 2cn ]   ✓   |- 2 e. CC
10  [ : ax-1cn ]    ✓   |- 1 e. CC
11  [9 10 10 : addassi ]    ✓   |- ( ( 2 + 1 ) + 1 ) = ( 2 + ( 1 + 1 ) )
12  [4 2 11 : 3eqtr4i ] ✓   |- ( 2 + 2 ) = ( ( 2 + 1 ) + 1 )
2p2e4   [8 1 12 : 3eqtr4ri ]    ✓   |- ( 2 + 2 ) = 4

That's a lot of eqid use. Nothing wrong with that, many people use that style, but it seems odd that it's the only option presented, even though I know there's another proof that uses fewer statements: 3eqtr4ri, addassi, ax-1cn, and 2cn.

david-a-wheeler commented 1 year ago

And more information. I turned on log level 2 on the bottom-up search with the root 3eqtr4ri. It went backwards but only found two working approaches.

I looked at the failing approaches. One said:

New variables are not allowed, but one had to be created when unifying essential hypothesis #1:
1: ?
2: |- 4 = ( ( 2 + 1 ) + 1 )
3: |- ( 2 + 2 ) = ( 2 + ( 1 + 1 ) )

Allowing new variables gave more options but it still didn't lead to the same proof being an option. Which is unexpected to me.

expln commented 1 year ago

Hi David,

mm-lamp counts length of a proof by depth of its proof tree. Two proofs may have same depth, but they may use assertions with different number of arguments. Each argument becomes a step. The different number of arguments results in different number of steps. So two proofs may have different number of steps but from mm-lamp point of view they are both of the same length.

I think this is what happens in the first part of your question. The two screenshots below demonstrate this for the two proofs: proof-1 - the one you've started with. proof-2 - the one you've found with bottom-up prover after removing some steps in the proof-1.

I hope this explains the first part of your question (why mm-lamp doesn't see a shorter proof).

As of the second part (why mm-lamp finds only one proof when you instruct it to use 3eqtr4ri), it requires more time for investigation, I will reply a bit later.

Proof-1: image

Proof-2: image

david-a-wheeler commented 1 year ago

mm-lamp counts length of a proof by depth of its proof tree. Two proofs may have same depth, but they may use assertions with different number of arguments. Each argument becomes a step. The different number of arguments results in different number of steps. So two proofs may have different number of steps but from mm-lamp point of view they are both of the same length.

Sure, I was expecting that. But the diagrams above show 2 different proofs with the same depth. (depth 7 if you count the intermediate expressions as you did above, depth 4 if you only count the justifications). Shouldn't they both show up?

expln commented 1 year ago

depth 7 if you count the intermediate expressions as you did above, depth 4 if you only count the justifications

You are right, I mistakenly included extra steps, correct length is 4.

And yes indeed your initial proof is not showing up in the bottom-up results. At the moment I don't understand why. I need more time for analysis.

david-a-wheeler commented 1 year ago

At the moment I don't understand why. I need more time for analysis.

No problem. I'm hoping that it's a bug that can be fixed and lead to even better automated proof-finding :-).

expln commented 1 year ago

I think you are right and this is a bug. My idea (I have to verify it yet) that specifying that you want to use steps 4 and 8 should be enough to find the required substitution for 3eqtr4ri. But mm-lamp matches steps with the assertion hypotheses in not the most efficient way. The most efficient way would be to match the second essential hypothesis with the step 8, and then match the third essential hypothesis with the step 4. In that case all the variables of the assertion are bound to correct subexpressions. But mm-lamp begins matching with the first essential hypothesis. This leads to the error you pointed out "New variables are not allowed, but one had to be created when unifying essential hypothesis # 1:".

I am almost sure this is the root cause. But in this case I am impressed how all your comments are precise in trying to help me to find the root cause :)

image

david-a-wheeler commented 1 year ago

The most efficient way would be to match the second essential hypothesis with the step 8, and then match the third essential hypothesis with the step 4. In that case all the variables of the assertion are bound to correct subexpressions. But mm-lamp begins matching with the first essential hypothesis.

It sounds like the problem is that the tool should be trying all possibilities (all permutations) & it's only trying one. Basically, it sounds like it's missing an iteration. If that's correct, then adding the iteration may slow down the search (if it has to go to the same depth) but it would make useful results more likely :-).

It makes sense why testing would miss this, if that's the problem. If the test case tries the correct match first, it would work just fine. You have to use a test case where the first-tried hypothesis doesn't work. What's more, all the individual returned results would be correct, the problem is that the returned results would be incomplete.

expln commented 1 year ago

This bug is fixed on dev. It works fine for me for the example from this issue. I have not checked other cases, but all the integration tests pass. Please check how it works for you.

The root cause was exactly as I described.

I think mm-lamp tries all possible permutations. But this is not enough. Actually the result also depends on in which order mm-lamp matches each permutation.

In the fix, I changed this order. Previously used order was not very efficient. Now, I think, it's more efficient. Effectively, I removed some possibilities from being considered and added others. This doesn't change much overall number of possibilities being checked. So it should not affect performance. I think that majority (if not all) of the removed possibilities were useless, and all the added possibilities are useful. So I expect better automated proof-finding after this fix, exactly as you predicted :)

After this fix, you may achieve the desired result with the below steps (after removing 1, 2, and 9 steps in the editor):

  1. Select the goal and click on "unify" to enter the bottom-up prover.
  2. Disable "Allow new disjoints" and "Allow new variables", and leave enabled "Allow new steps".
  3. Leave "Statement length restriction" as "Less".
  4. Set the Allowed statements for the first level to exactly steps 4 and 8. Leave "None" for "other levels".
  5. Search depth stays at 4.
  6. Click on "Prove".

The proof which uses 3eqtr4ri, addassi, ax-1cn and 2cn will be found.

You also may skip step 4 in the instruction above (set "Allowed statements first level" = All). The proof will be found, but it will take more time.

I will test dev environment during one or two days, and then I will release it.

david-a-wheeler commented 1 year ago

This bug is fixed on dev. It works fine for me for the example from this issue.

Wonderful!! Thank you so much.

I have not checked other cases, but all the integration tests pass. Please check how it works for you.

I will. The obvious way is to try other steps in various proofs and see if they can now be found.

I have high hopes that this change will make its automation far more effective without any special casing. Which would be great. In the long run I think it will need some special automation, and someday a programming language, but being able to do much without those is excellent.

I will test dev environment during one or two days, and then I will release it.

Will this be version 12, silently another 11, or 11.0.1? Asking for a friend :-).

david-a-wheeler commented 1 year ago

I tried syl in "easy mode" - that continues to work: https://lamp-guide.metamath.org/#easy-proof-of-syl

I tried proving syl in "hard mode" (axioms only per https://lamp-guide.metamath.org/#hard-mode-proving-syl-using-only-axioms.

I first tried in super-hard-mode with no intermediate steps to help it. I then tried a bottom-up proof for the final qed step. All steps allowed at all levels, statement length unrestricted, anything allowed. It did not find it (to be fair it didn't before either). Shouldn't it find it? It's less than depth 4. It does identify ax-mp as a possible first step backwards, with the right pattern, but doesn't manage to complete it. Here's what it found:

3       ?   |- &W1
4       ?   |- ( &W1 -> ( ph -> ch ) )
qed [3 4 : ax-mp ]  ?   |- ( ph -> ch )

Here is the setup I used:

{"srcs":[{"typ":"Web","fileName":"","url":"https://us.metamath.org/metamath/set.mm","readInstr":"StopAfter","label":"ax-3","resetNestingLevel":true,"allLabels":[]}],"descr":"Prove syl.","varsText":"","disjText":"","stmts":[{"label":"1","typ":"e","isGoal":false,"cont":"|- ( ph -> ps )","jstfText":""},{"label":"2","typ":"e","isGoal":false,"cont":"|- ( ps -> ch )","jstfText":""},{"label":"qed","typ":"p","isGoal":true,"cont":"|- ( ph -> ch )","jstfText":""}]}

That didn't work, so I gave it an intermediate step to use (as described in the tutorial). Here's my new setup:

{"srcs":[{"typ":"Web","fileName":"","url":"https://us.metamath.org/metamath/set.mm","readInstr":"StopAfter","label":"ax-3","resetNestingLevel":true,"allLabels":[]}],"descr":"Prove syl.","varsText":"","disjText":"","stmts":[{"label":"1","typ":"e","isGoal":false,"cont":"|- ( ph -> ps )","jstfText":""},{"label":"2","typ":"e","isGoal":false,"cont":"|- ( ps -> ch )","jstfText":""},{"label":"3","typ":"p","isGoal":false,"cont":"|- ( ( ph -> ps ) -> ( ph -> ch ) )","jstfText":""},{"label":"qed","typ":"p","isGoal":false,"cont":"|- ( ph -> ch )","jstfText":""}]}

Bottom-up proving on either the goal or step before the goal doesn't find the proof, even with Unrestricted & using all steps.

I to go now. Thoughts? Is this expected behavior?

expln commented 1 year ago

Will this be version 12, silently another 11, or 11.0.1? Asking for a friend :-).

This will be version 12 :) I plan to never silently update existing versions because of the caching issue I described here https://github.com/metamath/lamp-guide/issues/78#issuecomment-1613776103 As of 11.0.1, I still think that just increasing version by 1 will be sufficient.

Is this expected behavior?

I need more time to understand what's going on. But if you are reading this and can reply, could you please confirm that you found an example which worked before the fix and doesn't work after the fix? Or you are describing something new? Because this fix will not drastically increase power of the prover. I think the increase is relatively small.

david-a-wheeler commented 1 year ago

could you please confirm that you found an example which worked before the fix and doesn't work after the fix? Or you are describing something new? Because this fix will not drastically increase power of the prover. I think the increase is relatively small.

The super-hard-mode test did not work before, and it doesn't work now, so this is unchanged behavior. Still, I'd like to understand this behavior. I think I do, but I'd like to confirm.

In this case, the bottom-up prover working from the goal won't find a full solution, but it will find this:

4       ?   |- &W1
5       ?   |- ( &W1 -> ( ph -> ch ) )
qed [4 5 : ax-mp ]  ?   |- ( ph -> ch )

It should be able to eventually backtrack to find a match, because the work variable &W1 has the right type. However, I think the problem is that the work variable &W1 is in multiple places and we're doing matching not full unification. Therefore, if it unified &W1 the tool would have to propagate the match, but since it can't do that it refuses the match. Is that correct?

david-a-wheeler commented 1 year ago

BTW, here's a final proof. Backwards from the goal it's only depth 3, and there are few axioms/theorems to try so a backwards search should find it easily. Again, I think it's because the bottom-up prover can't do full unification of work variables that occur in multiple places, but I'd like confirmation (or clarification if I misunderstood what it's doing:

{"srcs":[{"typ":"Web","fileName":"","url":"https://us.metamath.org/metamath/set.mm","readInstr":"StopBefore","label":"ax-3","resetNestingLevel":true,"allLabels":[]}],"descr":"Prove syl.","varsText":"","disjText":"","stmts":[{"label":"syl.1","typ":"e","isGoal":false,"cont":"|- ( ph -> ps )","jstfText":""},{"label":"syl.2","typ":"e","isGoal":false,"cont":"|- ( ps -> ch )","jstfText":""},{"label":"3","typ":"p","isGoal":false,"cont":"|- ( ( ps -> ch ) -> ( ph -> ( ps -> ch ) ) )","jstfText":": ax-1"},{"label":"4","typ":"p","isGoal":false,"cont":"|- ( ( ph -> ( ps -> ch ) ) -> ( ( ph -> ps ) -> ( ph -> ch ) ) )","jstfText":": ax-2"},{"label":"2","typ":"p","isGoal":false,"cont":"|- ( ph -> ( ps -> ch ) )","jstfText":"syl.2 3 : ax-mp"},{"label":"1","typ":"p","isGoal":false,"cont":"|- ( ( ph -> ps ) -> ( ph -> ch ) )","jstfText":"2 4 : ax-mp"},{"label":"syl","typ":"p","isGoal":true,"cont":"|- ( ph -> ch )","jstfText":"syl.1 1 : ax-mp"}]}
expln commented 1 year ago

Hi David,

Yes, for all the cases you mentioned as those which could not be proved by the bottom-up prover the reason is that full unification is not implemented. However, for one case I doubt it will become "bottom-up provable" even after the full unification is implemented. I mean this case:

image

And the corresponding JSON:

{"srcs":[{"typ":"Web","fileName":"","url":"https://us.metamath.org/metamath/set.mm","readInstr":"StopAfter","label":"ax-3","resetNestingLevel":true,"allLabels":[]}],"descr":"Prove syl.","varsText":"","disjText":"","stmts":[{"label":"1","typ":"e","isGoal":false,"cont":"|- ( ph -> ps )","jstfText":""},{"label":"2","typ":"e","isGoal":false,"cont":"|- ( ps -> ch )","jstfText":""},{"label":"qed","typ":"p","isGoal":true,"cont":"|- ( ph -> ch )","jstfText":""}]}

Is mmj2 able to prove it automatically? (I need some time to set it up, but if you use mmj2 often that may be easy for you to check; if not, then please don't mind, I'll check it myself). If mmj2 is able to prove it, then do you know if it uses some heuristics or full unification only is enough to prove this?

In this case, the bottom-up prover working from the goal won't find a full solution, but it will find this:

4     ?   |- &W1
5     ?   |- ( &W1 -> ( ph -> ch ) )
qed   [4 5 : ax-mp ]  ?   |- ( ph -> ch )

It should be able to eventually backtrack to find a match, because the work variable &W1 has the right type. However, I think the problem is that the work variable &W1 is in multiple places and we're doing matching not full unification. Therefore, if it unified &W1 the tool would have to propagate the match, but since it can't do that it refuses the match. Is that correct?

Currently, mm-lamp can find "what to place instead of variables in assertions". It cannot do the same for expressions (i.e. "what to place instead of variables in steps"). |- ( &W1 -> ( ph -> ch ) ) is a step, and &W1 is a variable in a step, so mm-lamp doesn't attempt to find "what to put instead of &W1". I plan to implement "what to place instead of variables in steps" but only for the very first level, i.e. when the root assertion gets applied to the step to prove. Implementing this for further levels might be very complex. I will check it more deeply, of course, but for now I feel like I will have to rewrite a lot of code (all the prover logic), and I don't have time for this. That's why I doubt that full unification will be able to solve this particular issue with &W1.

I have already released this fix to "prod" https://expln.github.io/lamp/v12/index.html Please confirm this issue may be closed. (I mentioned one of the comments in another relevant issue for not to forget it - https://github.com/expln/metamath-lamp/issues/77#issuecomment-1616826010 )

david-a-wheeler commented 1 year ago

is mmj2 able to prove this automatically?

No, though mmj2 doesn't have exactly the same capability as "bottom-up prover" either.

mmj2 has a different approach. It has some built-in automations, some used automatically & some you have to say "okay to try" using "!". You can't ask it to fill in arbitrary depths, it only goes depth 1 and then tries to use various built-in rules to try our further depths (which results in fast results when it works, and no results when it doesn't). The closest equivalent would be this, and it can't fully automate this:

$( <MM> <PROOF_ASST> THEOREM=syllogism LOC_AFTER=ax-3
hd1::syllogism.1 |- ( ph -> ps ) 
hd2::syllogism.2 |- ( ps -> ch ) 
!qed::         |- ( ph -> ch ) 
$)

If you give mmj2, it will fill in a little:

$( <MM> <PROOF_ASST> THEOREM=syllogism LOC_AFTER=ax-3
hd1::syllogism.1 |- ( ph -> ps ) 
hd2::syllogism.2 |- ( ps -> ch ) 
!              |- ( ( ph -> ps ) -> ( ph -> ch ) ) 
!qed::         |- ( ph -> ch ) 
$)

Here's what it generates in this case (it's looking for me to fill in a "d1" label):

$( <MM> <PROOF_ASST> THEOREM=syllogism LOC_AFTER=ax-3
hd1::syllogism.1 |- ( ph -> ps ) 
hd2::syllogism.2 |- ( ps -> ch ) 
!d3::          |- ( ( ph -> ps ) -> ( ph -> ch ) ) 
qed:d1,d3:ax-mp |- ( ph -> ch ) 
$)

mmj2 can do this automatically:

$( <MM> <PROOF_ASST> THEOREM=syllogism LOC_AFTER=
h              |- ( ph -> ps ) 
h              |- ( ps -> ch ) 
!              |- ( ph -> ( ps -> ch ) ) 
!              |- ( ( ph -> ps ) -> ( ph -> ch ) ) 
!qed           |- ( ph -> ch ) 
$)

Resulting in:

$( <MM> <PROOF_ASST> THEOREM=syllogism LOC_AFTER=
hd1::syllogism.1 |- ( ph -> ps ) 
hd2::syllogism.2 |- ( ps -> ch ) 
d3:d2:a1i      |- ( ph -> ( ps -> ch ) ) 
d4:d3:a2i      |- ( ( ph -> ps ) -> ( ph -> ch ) ) 
qed:d1,d4:ax-mp |- ( ph -> ch ) 
$=  ( syllogism.1 syllogism.2 wi a1i a2i ax-mp ) ABFACFDABCBCFAEGHI $.
$)
expln commented 1 year ago

Thank you for this useful update. I will compare it to how mm-lamp works when the full unification is implemented.

I am closing this issue since the original problem is resolved. Please feel free to re-open this issue if you think there is still something to be handled.