expln / metamath-lamp

Metamath-lamp (Lite Assistant for Metamath Proofs) is a GUI-based proof assistant for creating formal mathematical proofs in Metamath that does not require installation (just run it directly using your web browser).
https://expln.github.io/lamp/latest/index.html
MIT License
14 stars 5 forks source link

Proposal: Allow bottom-up prover to unify work variables more broadly (long-term) #77

Open david-a-wheeler opened 1 year ago

david-a-wheeler commented 1 year ago

I have an idea - it'd be great if the unifier/bottom-up prover could unify on work variables more aggressively, at least optionally. This is an area where mmj2 can currently do more than metamath-lamp can.

I'm trying to re-prove syl when loading set.mm and stopped after ax-3 (that is, I only have the axioms in the context).

I want to prove, using ax-2, the following step which was created earlier by a backwards application of ax-mp:

|- ( &W1 -> ( ( ph -> ps ) -> ( ph -> ch ) ) )

However, when I select this statement and unify (prove bottom-up), ax-2 doesn't appear as an option. That's true even if I allow "everything" by changing the Length restriction to "No", allow root statements at all levels, and enable everything else. If I try to enter ax-2 directly as a label, the backwards prover won't offer ax-2 as an option - it clearly thinks this statement cannot match ax-2.

Follow this link to see the state. The full situation is also captured by this json:

{"srcs":[{"typ":"Web","fileName":"","url":"https://us.metamath.org/metamath/set.mm","readInstr":"StopAfter","label":"ax-3","resetNestingLevel":true,"allLabels":[]}],"descr":"Reprove syllogism (\"syl\"), using only axioms.","varsText":"var1 wff &W1","disjText":"","stmts":[{"label":"syl.1","typ":"e","cont":"|- ( ph -> ps )","jstfText":""},{"label":"syl.2","typ":"e","cont":"|- ( ps -> ch )","jstfText":""},{"label":"3","typ":"p","cont":"|- ( &W1 -> ( ( ph -> ps ) -> ( ph -> ch ) ) )","jstfText":""},{"label":"2","typ":"p","cont":"|- &W1","jstfText":""},{"label":"1","typ":"p","cont":"|- ( ( ph -> ps ) -> ( ph -> ch ) )","jstfText":"2 3 : ax-mp"},{"label":"syl","typ":"p","cont":"|- ( ph -> ch )","jstfText":"syl.1 1 : ax-mp"}]}

The desired state is to be able to unify it with ax-2.

It also won't unify through merging. I can search for ax-2 and insert it, which is fine, and the instance has a bunch of metavariables as expected. But "merge" will not merge the original statement and the instance of ax-2, no matter what the order is.

I can replace the work variable &W1 with ( ph -> ( ps -> ch ) ) and apply "unify". Then bottom-up proving immediately unifies to that result.

This is unfortunate. If the backwards prover could handle this case (even optionally) I bet it could automatically prove a lot more. In fact, if it noticed these matches, it could probably prove some things that mmj2 struggles with (because in mmj2 you can't really control the search of just one statement very well).

I tested on this, but I don't think the version matters: https://expln.github.io/lamp/dev/index.html

This is in contrast to mmj2, which can unify some work variables and lets you specify an assertion like ax-2 to be used on such a statement. A "before" and "after" should clarify things. For example, here's an example of "syllogism" where I just typed in ax-2 in the analogous situation (the "*" comments are hints to myself of where I'm going):

$( <MM> <PROOF_ASST> THEOREM=syllogism LOC_AFTER=

hd1::syllogism.1 |- ( ph -> ps ) 
hd2::syllogism.2 |- ( ps -> ch ) 

* !              |- ( ph -> ( ps -> ch ) ) 
* !              |- ( ( ph -> ps ) -> ( ph -> ch ) ) 
!d3::              |- &W1
!d5::              |- &W2
!d6::ax-2              |- ( &W2 -> ( &W1 -> ( ph -> ch ) ) )
d4:d5,d6:ax-mp          |- ( &W1 -> ( ph -> ch ) )
qed:d3,d4:ax-mp     |- ( ph -> ch ) 

$)

In mmj2, after pressing control-U, I get a unification which helps me to keep going:

$( <MM> <PROOF_ASST> THEOREM=syllogism LOC_AFTER=

hd1::syllogism.1 |- ( ph -> ps ) 
hd2::syllogism.2 |- ( ps -> ch ) 

* !              |- ( ph -> ( ps -> ch ) ) 
* !              |- ( ( ph -> ps ) -> ( ph -> ch ) ) 
!d3::              |- ( ph -> &W3 )
!d5::              |- ( ph -> ( &W3 -> ch ) )
d6::ax-2           |- (  ( ph -> ( &W3 -> ch ) )
                      -> ( ( ph -> &W3 ) -> ( ph -> ch ) ) )
d4:d5,d6:ax-mp     |- ( ( ph -> &W3 ) -> ( ph -> ch ) )
qed:d3,d4:ax-mp     |- ( ph -> ch ) 

$)
expln commented 1 year ago

Thanks for the detailed explanations. I will analyze the issue and return with some response.

david-a-wheeler commented 1 year ago

Thanks, and as always, thanks for your time.

I did an experiment with a surprising result, which may be helpful to you. I expected that the problem was that "&W1" would only match a wff variable (which would be an understandable limitation). So I modified "Variables" to say:

.tempwff wff &W1

I then modified my "statement 3" to delete its justification & changed the first wff variable to a work variable, resulting in this:

|- ( ( &W1 -> ( ps -> ch ) ) -> ( ( ph -> ps ) -> ( ph -> ch ) ) )

Even with this change, a bottom-up proof didn't find ax-2.

david-a-wheeler commented 1 year ago

In case it's relevant, it'd also be awesome if the tool could automatically replace work variables on unification. The review posted on May 27, 2023, at 10:17 AM, by Alexander van der Vekens on the Metamath mailing list (metamath@googlegroups.com) said:

The replacement of a temporary variable is a little bit cumberson: if you want to replace &C1 by 6, you cannot change it at one place (and then it is replaced everywhere, as in mmj2), but you have to click A (apply a substitution to all statements), then enter &C1 and 6 manually.

I'm not sure this is related, but since it might be, I thought I'd note it.

expln commented 1 year ago

I analyzed this issue and yes indeed this is not a bug, this is a limitation of the mm-lamp unifier.

This case is special because of the following reason. In order to unify ax-2 with |- ( &W1 -> ( ( ph -> ps ) -> ( ph -> ch ) ) ) we have to substitute variables in both expressions. But mm-lamp's unifier is unable to find such substitutions.

What it can do is: given two expressions exprA and exprB, it can find a substitution subsUni such that by applying subsUni to exprA we are obtaining exprB, i.e.: subsUni(exprA) = exprB.

What the case described in this issue requires is to find a substitution subsBi such that (using similar notation): subsBi(exprA) = subsBi(exprB).

In my notation, Uni stands for "unidirectional" and Bi stands for "bidirectional".

Let me demonstrate this on real expressions. In oder for the next example not to be misleading, I have to express ax-2 using work variables, but this should not be a problem because it doesn't matter what variable names we choose to describe an assertion as soon as variable types are correct (please correct me if I am wrong).

exprA = |- ( ( &W3 -> ( &W4 -> &W2 ) ) -> ( ( &W3 -> &W4 ) -> ( &W3 -> &W2 ) ) ) exprB = |- ( &W1 -> ( ( ph -> ps ) -> ( ph -> ch ) ) )

A substitution subsBi (such that subsBi(exprA) = subsBi(exprB) ) exists, and it is as follows:

&W2 → ch
&W3 → ph
&W4 → ps
&W1 → ( ph -> ( ps -> ch ) )

But mm-lamp searches only for subsUni, such that subsUni(exprA) = exprB. Obviously it doesn't exist because length(exprB) < length(exprA).

The workaround you've described in the guide, when you manually substitute &W1 → ( ph -> ( ps -> ch ) ) is absolutely correct. It allows to bypass this limitation using the smallest amount of logical steps.

I would like to share another possible solution. It uses more logical steps, but it demonstrates that such a "bidirectional" substitution may be achieved by two "unidirectional" substitutions. I recorded a video because it would be difficult to write (and probably to read) an explanation - (no sound) https://drive.google.com/file/d/1KIr0eOEmH4VoIHOHFhqXwBn08h-xGicV/view?usp=sharing

To fix this issue, I plan to use the fact that a "bidirectional" substitution may be achieved by two "unidirectional" substitutions. I don't have a very clear plan how to implement this in mm-lamp, but I have some ideas to begin with.

Sorry if my explanation is difficult to understand because of unusual terminology. I am ready to answer clarifying questions if you have any.

david-a-wheeler commented 1 year ago

@expln - thanks for the explanation! I also appreciate you pointing out that it's possible to select two expressions to make replacements easier.

For now, I think the solution is to document both approaches in the guide, given the current limitations in the unification algorithm. That said, I hope we don't stop there, see my next comment here...

david-a-wheeler commented 1 year ago

In the longer term, it'd be great if the unifier could handle this more general case, at least as an enable-able option. Unification has been studied to death, I bet there are some awesome algorithms that do much with decent performance.

@digama0 - any suggestions on how to improve the metamath-lamp unification algorithm, given the discussion above? In particular, any pointers to some of the most relevant literature?

digama0 commented 1 year ago

There is a canonical algorithm for what @expln calls "bidirectional unification" (normally this is called "unification" and the unidirectional version is called "matching") which can be done purely recursively. There is a description on wikipedia, although I think the version I have implemented for MM1 is both more executable and simpler. I will summarize it:

david-a-wheeler commented 1 year ago

@digama0 - thank you so much for your wonderfully clear & specific reply!

Now things make much more sense. Alexander van der Vekens earlier complained of problems using metamath-lamp, and while I'm sure part of the problem was a lack of documentation, this difference is almost certainly part of the issue as well.

I don't know how long it would take to implement full unification instead. I suspect that once it is, the tool would become much more capable AND would be easier to use for those who've used other tools like mmj2. Full unification might be slower than matching; I can see selecting matching vs. unification as a possible option in the search dialogue. That said, I suspect "full unification" would be the preferred setting, as that would greatly increase the likelihood of finding solutions.

expln commented 1 year ago

Thank you @digama0 for your detailed answer and especially for providing the summary of the algorithm. I would not understand it from the Rust code without that summary.

In this notation t(a1, ..., an), is my understanding correct that this represents a tree structure where a1,...,an are sibling nodes of this tree? And overall, all expressions are stored as trees (or in other format which may be transformed into a tree very fast)?

@david-a-wheeler

I hope we don't stop there In the longer term, it'd be great if the unifier could handle this more general case, at least as an enable-able option.

Sure, I already have one idea what I can do to handle this and other similar cases. But yeah, that's not a simple change, so it will need some time for implementation.

Thank you for posting this issue with detailed description and explanations how it is handled in mmj2. Currently this is the only way how I can discover such non-trivial uses cases commonly used during creation of proofs.

digama0 commented 1 year ago

In this notation t(a1, ..., an), is my understanding correct that this represents a tree structure where a1,...,an are sibling nodes of this tree? And overall, all expressions are stored as trees (or in other format which may be transformed into a tree very fast)?

Yes, a1, ..., an are the children of a node with t as the constructor. And yes, this is presuming that the expressions are stored as trees, or else already parsed to trees. For example, ( &W1 -> ( ph -> &W2 ) ) is represented as app[wi](mvar[wff, 1], app[wi](var[wph], mvar[wff, 2])) (where t(a1,..., an) is now being written more explicitly as app[t](a1, ..., an)); for most purposes you can probably treat var nodes the same as app, except for $d propagation.

I would say that not doing parsing would be a big misstep for any MM proof assistant - you gain the ability to read some unusual metamath databases at the cost of not being able to reason about the proof at a high enough level to perform useful automation activities.

david-a-wheeler commented 1 year ago

If it's not too hard, I suggest leaving the old unifier as an option, at least for bottom-up search. If the bottom up prover is too powerful, it's hard to make simple examples in the guide to explain how to deal with cases where the automation doesn't work. So having ways to disable the automation can help. That said, I expect that the default should be full unification.

david-a-wheeler commented 1 year ago

FYI, I looked for some more resources about implementing Unification, especially in OCaml, Rescript, or JavaScript. My hope is that this would help in addition to the awesome info from Mario above. Unification is a really old well-studied problem; Robinson's unification algorithm dates from 1965.

Here are some things I found that might be useful:

BTW, there's a nice video of the Curry-Howard Correspondence here (I found it while looking for examples): https://cs3110.github.io/textbook/chapters/adv/curry-howard.html

expln commented 1 year ago

Thanks David and Mario for the detailed explanations! The unification algorithm described by Mario looks really easy. I will analyze everything and return with my feedback.

david-a-wheeler commented 1 year ago

I totally understand if that's deferred, but I think this one is key. I think there are two parts to this:

  1. full unification during global substitution (the "substitution" icon) - I think this should always be the case.
  2. full unification during bottom-up search - I think this should be an option for bottom-up search, but also supporting the current matching algorithm.
expln commented 1 year ago

Supporting full unification in the global substitution dialog is also an excellent idea. I will implement this. But I think it should be optional too. Currently I am thinking of the following design for the global substitution dialog :

  1. As the first step users have to select what algorithm they want to use to find a substitution - matching or unification: Find a substitution by: [ ] matching [v] unification The selected algorithm is remembered by the app and is used for the next invocations of the dialog. So if a user always uses unification then they don't have to switch from matching to unification each time.
  2. As the second step users will have to provide two expressions in two text fields. The same way as it is implemented now. But labels of those two fields will vary depending on the selected algorithm:

    • for matching: "Match what" and "Match with"
    • for unification: "Unify what" and "Unify with"

    BTW, I've just recently renamed "Replace what" and "Replace with" to "Match what" and "Match with" https://github.com/expln/metamath-lamp/commit/174138de70f93f79c28667f2b3162ebb76b4e69e I hope this reduces the naming inconsistency. I will also update the guide after I release v11.

  3. Everything else remains the same how it works now: press "Find Substitution" button and then select one substitution and click "Apply" button.
digama0 commented 1 year ago

My 2 cents: Users don't care what algorithm you are using (unless it fails to do its job), and unless there are situations where each beats out the other on some axis (quality or performance) there is no reason to surface this implementation detail to the user. If you want to keep the matching algorithm around, just put it behind a dev option in the code.

Unification is very much a "behind the scenes" kind of algorithm. It is very important, but people just expect it to do its job and not pay much attention to it.

david-a-wheeler commented 1 year ago

It's hard to imagine a case where global substitution (the "substitution" icon) shouldn't use full unification. I can't think of any use cases. I like options, but only in the case where there's a reason to use it.

It'll change my syl example in the tutorial, though. I'll probably have to wait to record that until we know what it looks like and I can adjust to match.

expln commented 1 year ago

It is not so clear for me. I can imagine many different substitutions depending on the algorithm and additional settings. However, I have not yet studied different unification algorithms mentioned in this issue. Probably it will become clearer to me after I study them. But for now, I see that results may be different, that's why I wrote about allowing users to select from two options.

Assume we have the following symbols: constants: ~ ( ) global variables (defined in the loaded context): G1 G2 G3 local variables (defined in the editor): L1 L2

Example 1 (not possible in set.mm due to ambiguity of the syntax in the example 1, but demonstrates my current understanding better):

All the below algorithms use same two expressions: Expr1: L1 ~ G2 Expr2: G1 ~ L2 ~ G3

(all other not mentioned in the substitution variables also get substituted with themselves, i.e. G1G1 etc. )

Example 2 (should be possible in set.mm):

All the below algorithms use same two expressions: Expr1: L1 ~ G2 Expr2: ( G1 ~ L2 ) ~ G3

(all other not mentioned in the substitution variables also get substituted with themselves, i.e. G1G1 etc. )

As you can see, found substitutions may be different depending on the algorithm used. Probably, under the conditions when syntax is unambiguous, as in set.mm, the full unification may replace matching. But I am trying to keep mm-lamp as general as possible (that's what inspires me), and in that case matching cannot be skipped.

Please provide your thoughts and/or ask questions if any.

digama0 commented 1 year ago

Example 2 (should be possible in set.mm):

I don't know what the distinction you are making between "local" and "global" variables is. Do you have an example? In mmj2 there are variables like A, which might e.g. have been part of the theorem statement, dummy variables like z that are in scope via $f declarations and are legal for use in the course of a proof; and work variables like &C1 which represent subterms which have not yet been filled in and without which the proof cannot be considered complete.

To give this consistent terminology, I will call variables of the first two kinds "local constants" to emphasize that while these are variables from the perspective of the overall database, within the context of the proof they cannot be substituted and hence they are just as constant as actual constants like 0. For the purpose of unification you should treat them like constants.

The third kind are what I would call "metavariables"; these are being instantiated by the prover and are what we actually want to be assigning in this algorithm. When you are doing proof search and want to apply a theorem, or a theorem is used in the worksheet, the first thing you do is pull the theorem statement and replace all the variables in the theorem's frame by fresh metavariables. For example if the theorem is A = B, B = C |- A = C then you replace the variables with new metavariables ?A ?B ?C to get the statement ?A = ?B, ?B = ?C |- ?A = ?C, and then use unification between the goal and the stated goal, or the hypotheses and the referenced hypothesis statements, whatever you happen to have on hand to try to determine the values of these new metavariables.

Now, given your description my interpretation is that both L1 and G1 are metavariables as I have described them, so we would get the "Unification, unrestricted" version. How does this proceed?

We want to unify L1 ~ G2 =?= ( G1 ~ L2 ) ~ G3.

Note that there is no branching in this algorithm. Only one result is produced. (The ambiguous grammar version of this algorithm does have branching.) There is no need to consider alternative unifications like G3 := G2 because all of them yield the same result, up to the naming of the metavariables (which is immaterial). This is known as the "most general unifier" (mgu) in the literature, there is a very concrete sense in which all possible results of the algorithm are equivalent.

What I just described is your unification (1), this is the mgu. (2) is also the mgu. (3) and (4) are not correct, they are unifiers but not most general unifiers. (Actually, (3) and (4) are not even proper simultaneous substitutions, since G2 appears on the left and right of a rule in (3). I'm going to assume that's a typo and you meant to use G3 → G2 in (3) and G2 → G3 in (4).)

The key property of an mgu is that any unification must be a substitution instance of the mgu. If you assign L1 := ( G1 ~ G2 ), L2 := G2, G3 := G2 then both sides become ( G1 ~ G2 ) ~ G2, but L1 := ( a ~ b ), G2 := c, G1 := a, L2 := b, G3 := c is a unifier (resulting in ( a ~ b ) ~ c) which is not a substitution instance of ( G1 ~ G2 ) ~ G2, so this is not an mgu. Therefore (3), (4), and presumably everything else in your "etc." is rejected as not-best solutions to the constraints, and (1) and (2) are equivalent so you can return either one.

Example 1 (not possible in set.mm due to ambiguity of the syntax in the example 1, but demonstrates my current understanding better):

FWIW I think this should not be considered as part of the design at all. Ambiguous grammars are worth supporting only insofar as they can be supported without impacting the unambiguous grammar case, since that's what matters in practice.

The algorithm I gave for unification does not work in the presence of grammar ambiguity. You can plausibly extend the algorithm to handle it by considering all possible parses (or something along those lines), but it is more complex and frankly exceeds my personal budget for complication for niche use cases.

Getting a correct result here is tricky. This algorithm would look a lot more like a string diffing algorithm: you must match up all the constants and variables on the left and right, considering all possible interleavings of the constants. I will leave off a precise description of the algorithm, but in this case you would end up with the following results:

  1. The LHS ~ is in G1: G1 := L1 ~ L3, G2 := L3 ~ L2 ~ G3, both sides become L1 ~ L3 ~ L2 ~ G3 and a new mvar L3 is created
  2. The LHS ~ matches the first ~ on the RHS: L1 := G1, G2 := L2 ~ G3, both sides become G1 ~ L2 ~ G3
  3. The LHS ~ is in L2: L1 := G1 ~ L3, G2 := L4 ~ G2, L2 := L3 ~ L4, both sides become G1 ~ L3 ~ L4 ~ G2 and new mvars L3, L4 are created
  4. The LHS ~ matches the second ~ on the RHS: L1 := G1 ~ L2, G2 := G3, both sides become G1 ~ L2 ~ G3
  5. The LHS ~ is in G3: G3 := L3 ~ G2, L1 := G1 ~ L2 ~ L3, both sides become G1 ~ L2 ~ L3 ~ G2 and a new mvar L3 is created

There are no additional results. Note that for larger expressions this list grows pretty fast, and most cases are not possible because of syntactic constraints. You probably want to use a bracket matching heuristic here.

david-a-wheeler commented 1 year ago

Whups, I forgot a third case where full unification would make sense, the "unify" icon (!).

Correction, I think there are 3 cases where full unification makes sense (please let me know if I'm mistaken):

  1. full unification during global substitution (the "substitution" icon) - I think this should always be the case. If you want to make it an option that's fine (especially while you're unsure of its impact), but I'm guessing users would always leave it on.
  2. full unification during bottom-up search - I think this should be an option for bottom-up search, but also supporting the current matching algorithm.
  3. full unification when pressing the "unify" icon. This could be controlled by a setting, but mmj2 manages to do this quickly (on control-U) on even long proofs, so I suspect this would just stay "always on". This would mean someone could edit a statement to change just one instance of a given work variable, unify, & then suddenly that work variable would be replaced everywhere (without using "replace"). This is functionality mmj2 users have become used to.

I'm guessing a single full unification implementation could be used in all of these cases.

Agreed? What am I missing?

expln commented 1 year ago

Notes for me. Check this comments when the full unification is implemented:

expln commented 1 year ago

@digama0 @david-a-wheeler

I am sorry for the late response. I needed some time to read, understand and think of different cases.

I don't know what the distinction you are making between "local" and "global" variables is. Do you have an example?

Unlike mmj2, mm-lamp doesn't have metavariables. Mm-lamp uses only usual variables and constants, and it can substitute usual variables with expressions if mm-lamp can prove the expression is of the same type as the variable. "Local" and "global" have similar meaning as in other programming languages. Internally, mm-lamp uses a context with three levels of nesting:

$(
nesting level 0
this is the global scope - everything mm-lamp loaded from external sources.
in the code this is called preCtx - preloaded context.
$)

$v global_var1 global_var2 $.
global_var1_def $f class global_var1 $.
global_var2_def $f class global_var2 $.
$d global_var1 global_var2 $.

${
    $(
    nesting level 1
    this is the local scope - all the variables and disjoints a user provided in the "Variables" and "Disjoints" fields.
    in the code this is called wrkCtx - working context.
    $)

    $v local_var1 local_var2 $.
    local_var1_def $f class local_var1 $.
    local_var2_def $f class local_var2 $.
    $d local_var1 local_var2 $.

    ${
        $(
        nesting level 2
        this is the proof scope - all the hypotheses from the editor.
        in the code this is called proofCtx - the bottom-up prover works in this context.
        $)

        hypothesis1 $e ... $.
        hypothesis2 $e ... $.
    $}
$}

Mm-lamp can emulate the mmj2 approach by allowing to replace only local variables. Currently, it doesn't do that. But most probably I will do that for the full unification to make it simple.

When you are doing proof search and want to apply a theorem, or a theorem is used in the worksheet, the first thing you do is pull the theorem statement and replace all the variables in the theorem's frame by fresh metavariables.

Mm-lamp does the same, but it adds usual local variables instead of the metavariables.

(Actually, (3) and (4) are not even proper simultaneous substitutions, since G2 appears on the left and right of a rule in (3). I'm going to assume that's a typo and you meant to use G3 → G2 in (3) and G2 → G3 in (4).)

That was not a typo, mm-lamp allows not proper simultaneous substitutions. Here is a video demonstration https://drive.google.com/file/d/1VOjVKDHQ6S2X7Ylb-vG1-LZDpua094bO/view?usp=sharing Here is the mm-lamp state before the replacement:

 {"srcs":[{"typ":"Local","fileName":"demo0.mm","url":"","readInstr":"ReadAll","label":"","resetNestingLevel":true,"allLabels":[]}],"descr":"","varsText":"","disjText":"","stmts":[{"label":"2","typ":"p","isGoal":false,"cont":"|- ( t + 0 ) = t","jstfText":": a2"},{"label":"1","typ":"p","isGoal":false,"cont":"|- ( ( t + 0 ) = t -> ( ( t + 0 ) = t -> t = t ) )","jstfText":": a1"},{"label":"3","typ":"p","isGoal":false,"cont":"|- ( ( t + 0 ) = t -> t = t )","jstfText":"2 1 : mp"},{"label":"teqt","typ":"p","isGoal":true,"cont":"|- t = t","jstfText":"2 3 : mp"}]}

I suppose there is a chance that not proper simultaneous substitutions result in invalid proofs, but I have not done a deep analysis of this. I implemented this only based on my intuition.

Therefore (3), (4), and presumably everything else in your "etc." is rejected as not-best solutions to the constraints, and (1) and (2) are equivalent so you can return either one.

Thanks for the explanation. I plan to implement the recursive algorithm you've explained earlier. A single result is fine.

full unification when pressing the "unify" icon. This could be controlled by a setting, but mmj2 manages to do this quickly (on control-U) on even long proofs, so I suspect this would just stay "always on". This would mean someone could edit a statement to change just one instance of a given work variable, unify, & then suddenly that work variable would be replaced everywhere (without using "replace"). This is functionality mmj2 users have become used to.

How does mmj2 unify more than 2 statements? I suppose it applies the unification algorithm to different pairs of statements. But then the result may depend on the order in which statements were selected for the unification. For example:

stmt1: &A ~ &B stmt2: C ~ D stmt3: &B ~ &A

If I first unify stmt1 with stmt2, then I get: &A:=C and &B:=D. But if I unify stmt3 with stmt2, then I get: &A:=D and &B:=C.

I'm guessing a single full unification implementation could be used in all of these cases. Agreed? What am I missing?

I am not sure. The full unification for the global substitution and in the bottom-up prover are more or less simple. But for the third case (when pressing the "unify" icon) it doesn't look as simple for me. Anyway, I will start implementing the first two cases and I will look into the third one later.

expln commented 1 year ago

I suppose there is a chance that not proper simultaneous substitutions result in invalid proofs, but I have not done a deep analysis of this.

One such scenario is when there are global essential statements (i.e. loaded from external sources). If such an improper simultaneous substitution tries to replace a variable in one or few global essential statements this may lead to errors because global essential statements are read-only, mm-lamp doesn't modify them during the global substitution. But this is a bug, and I will fix it by prohibiting such scenarios.

expln commented 1 year ago

How does mmj2 unify more than 2 statements? I suppose it applies the unification algorithm to different pairs of statements. But then the result may depend on the order in which statements were selected for the unification.

I think the approach should be to find what single statement has changed and unify it with its previous version.

digama0 commented 1 year ago

I don't know what the distinction you are making between "local" and "global" variables is. Do you have an example?

Unlike mmj2, mm-lamp doesn't have metavariables. Mm-lamp uses only usual variables and constants, and it can substitute usual variables with expressions if mm-lamp can prove the expression is of the same type as the variable. "Local" and "global" have similar meaning as in other programming languages. Internally, mm-lamp uses a context with three levels of nesting: ...

Mm-lamp can emulate the mmj2 approach by allowing to replace only local variables. Currently, it doesn't do that. But most probably I will do that for the full unification to make it simple.

As far as the proof assistant is concerned, you should be treating global and local variables as fixed, not subject to replacement. If you want to notate metavariables the same way as local variables that's your choice, but it is liable to create issues like replacement of the variables in the hypothesis which is not correct (as you note later). In that case the easiest way to get correct behavior is to collect up the list of variables in the hypothesis & conclusion, and mark them as fixed, and then treat all other variables as metavariables.

When you are doing proof search and want to apply a theorem, or a theorem is used in the worksheet, the first thing you do is pull the theorem statement and replace all the variables in the theorem's frame by fresh metavariables.

Mm-lamp does the same, but it adds usual local variables instead of the metavariables.

Be careful not to use variable names used for fixed variables in that case!

(Actually, (3) and (4) are not even proper simultaneous substitutions, since G2 appears on the left and right of a rule in (3). I'm going to assume that's a typo and you meant to use G3 → G2 in (3) and G2 → G3 in (4).)

That was not a typo, mm-lamp allows not proper simultaneous substitutions. Here is a video demonstration https://drive.google.com/file/d/1VOjVKDHQ6S2X7Ylb-vG1-LZDpua094bO/view?usp=sharing

The example shown is a valid simultaneous substitution of ( t + t ) for t. You can do this, but it is confusing to talk about and generally easier to explain when the variables before and after are different, because they are unrelated even if they share the same spelling.

I understand better what you mean by (3) now, interpreted as a simultaneous substitution it is indeed a unifier producing ( G1 ~ G2 ) ~ G3 on both sides. This is however equivalent to (1) and (2) because it is the same but for the variable names, and similarly for (4). There is no need to enumerate all alpha equivalent variations on the unique mgu, you can just take the first one you find.

How does mmj2 unify more than 2 statements? I suppose it applies the unification algorithm to different pairs of statements. But then the result may depend on the order in which statements were selected for the unification. For example:

stmt1: &A ~ &B stmt2: C ~ D stmt3: &B ~ &A

If I first unify stmt1 with stmt2, then I get: &A:=C and &B:=D. But if I unify stmt3 with stmt2, then I get: &A:=D and &B:=C.

This is a failure, no matter what order you do things, unless C and D are themselves variables which can be unified. If you first unify stmt1 with stmt2 you get &A := C, &B := D, then you need to unify stmt3 with stmt2 meaning C ~ D =?= D ~ C, so you do C =?= D, which is a failure if C and D are constants and otherwise would assign C := D or the like such that all three statements read C ~ C. Once you have determined that the unification is a failure it doesn't really matter what partial assignment to the metavariables you came up with because it's inconsistent anyway.

I suppose there is a chance that not proper simultaneous substitutions result in invalid proofs, but I have not done a deep analysis of this.

One such scenario is when there are global essential statements (i.e. loaded from external sources). If such an improper simultaneous substitution tries to replace a variable in one or few global essential statements this may lead to errors because global essential statements are read-only, mm-lamp doesn't modify them during the global substitution. But this is a bug, and I will fix it by prohibiting such scenarios.

Not just loaded from external sources. Changing the theorem statement is never a legal result of unification. It might be a user action a la find/replace but unless the user is explicitly selecting to do so changing the goal is not ok.

expln commented 1 year ago

As far as the proof assistant is concerned, you should be treating global and local variables as fixed, not subject to replacement. If you want to notate metavariables the same way as local variables that's your choice, but it is liable to create issues like replacement of the variables in the hypothesis which is not correct (as you note later). In that case the easiest way to get correct behavior is to collect up the list of variables in the hypothesis & conclusion, and mark them as fixed, and then treat all other variables as metavariables.

Not just loaded from external sources. Changing the theorem statement is never a legal result of unification. It might be a user action a la find/replace but unless the user is explicitly selecting to do so changing the goal is not ok.

This still might be useful, for example, to replace one variable with another (renaming of a variable). But I agree, this might be confusing. I think it makes sense to warn the user when a substitution is going to change a hypotheses or the goal statement.

There is no need to enumerate all alpha equivalent variations on the unique mgu, you can just take the first one you find.

Yes, that's what I am going to do :)

Thanks for the explanation of my example with unification of three statements. Probably I will ask more questions about it later, when I start working on that functionality.

digama0 commented 1 year ago

As far as the proof assistant is concerned, you should be treating global and local variables as fixed, not subject to replacement. If you want to notate metavariables the same way as local variables that's your choice, but it is liable to create issues like replacement of the variables in the hypothesis which is not correct (as you note later). In that case the easiest way to get correct behavior is to collect up the list of variables in the hypothesis & conclusion, and mark them as fixed, and then treat all other variables as metavariables.

Not just loaded from external sources. Changing the theorem statement is never a legal result of unification. It might be a user action a la find/replace but unless the user is explicitly selecting to do so changing the goal is not ok.

This still might be useful, for example, to replace one variable with another (renaming of a variable). But I agree, this might be confusing. I think it makes sense to warn the user when a substitution is going to change a hypotheses or the goal statement.

I think it is about as meaningful to replace a variable from the theorem statement as it is to replace occurrences of + with - everywhere in the proof. It's not really a variable at all, it's a local constant. It is of course sometimes useful to replace subterms like + with -, with or without syntax-aware assistance (which is to say, you can also just have a text search to do this), but it is an entirely different activity from unification and you should not confuse it with such.

expln commented 1 year ago

@david-a-wheeler @digama0

I implemented the full unification during global substitution (the "substitution" icon), and automatic merging of similar steps. It is available on dev https://expln.github.io/lamp/dev/index.html

With these two new features the example from my first demo becomes more simple - https://drive.google.com/file/d/1YM3EAbJeWJglWITkEZ-FxCJpDgnbkzt1/view?usp=sharing

Here are some explanations of decisions I made.

Automatic merging of similar steps is optional. There is a global setting "Merge similar steps automatically". It is On by default. It can merge only in simple cases: 1) both similar steps must be P or G (not H); 2) one of them must have a non-empty justification and another must have an empty justification. Only in that case the one with empty justification will be removed and all references to it will be replaced with the label of the remaining step. If automatic merging is not possible then the manual merging is still available and works as before (the Merge icon). After I composed this comment I realised that I have to tweak the "auto merge" feature to not remove the G step. I will do this.

The full unification during global substitution is also optional. But the choice remains for the next invocation of the substitution dialog. So if you always prefer to use the full unification you don't have to change anything each time. During implementation of the full unification algorithm I understood the importance of separating variables into "local constants" and metavariables. It was not so obvious to me from just reading the algorithm Mario explained. Because of that I introduced another new global setting "Prefix of metavariables in unification". By default, it is &. It means that during the full unification (in the global substitution now, and in proving bottom-up and on pressing the "unify" icon when it is implemented) all the variables with names starting with this prefix will be considered as metavariables and all other variables will be "local constants". I could have not introduced this setting and consider all the variables defined in the editor as metavariables. But I think I will need to have both kinds of variables in my future proofs, because I plan to create theorems with a lot of local variables (i.e. defined withing frame blocks ${ $} ) with diverse names for better readability. However, this setting should not make things more complex to those who are used to mmj2. With the settings by default, i.e. when new variable names start with &, the full unification will work very similar to mmj2, i.e. all the variables defined in the editor will be metavariables for the full unification.

Another thing which I understood during implementation of the full unification algorithm is that maybe the name of the "Unify all" button (the Hub icon) is misleading. Initially I thought that unification means connecting all the steps by finding appropriate justifications. But now I understand that unification might mean the very specific process of finding the mgu. So maybe it worth renaming the "Unify all" button to something else? For example, "Justify all" with the tooltip "Justify all steps or justify selected provable bottom-up"?

digama0 commented 1 year ago

By the way, regarding metavariable prefixes, my personal preference is to use names like ?a, ?b, or if you prefer, ?1, ?2, without indicating explicitly what sort the metavariables belong to. & isn't really very evocative of "something yet to be figured out by the prover", and I keep it in mmj2 mainly for backward compatibility - there isn't a good reason to break saved worksheets for this.

Another thing which I understood during implementation of the full unification algorithm is that maybe the name of the "Unify all" button (the Hub icon) is misleading. Initially I thought that unification means connecting all the steps by finding appropriate justifications. But now I understand that unification might mean the very specific process of finding the mgu. So maybe it worth renaming the "Unify all" button to something else? For example, "Justify all" with the tooltip "Justify all steps or justify selected provable bottom-up"?

The "unify" operation in mmj2 is indeed quite overloaded. I would describe it more generally as "make everything coherent and conformant": If steps are notated with an abbreviated syntax, use the canonical syntax; if steps are missing a formula, fill it in; if steps are missing a justification (and the settings are such that we should try to fill it in), fill it in if we can; if formulas have metavariables which can be nontrivially unified, assign and rewrite the metavariables; if all the steps are done, add the final proof block.

expln commented 1 year ago

I agree that ? suits metavariables better. It is possible to achieve ?1, ?2, ?3 naming by changing few settings:

image

I don't want to change defaults to this because there are already videos and documentation with & prefixes and also it looks similar to mmj2.

The "unify" operation in mmj2 is indeed quite overloaded. I would describe it more generally as "make everything coherent and conformant": If steps are notated with an abbreviated syntax, use the canonical syntax; if steps are missing a formula, fill it in; if steps are missing a justification (and the settings are such that we should try to fill it in), fill it in if we can; if formulas have metavariables which can be nontrivially unified, assign and rewrite the metavariables; if all the steps are done, add the final proof block.

Good to know. Thanks.

digama0 commented 1 year ago

I don't want to change defaults to this because there are already videos and documentation with & prefixes and also it looks similar to mmj2.

How did that happen? Didn't you just implement this a few hours ago? In previous versions I only see regular variables, you yourself said that mm-lamp has no concept of metavariables (until now, I guess). I don't think visual similarity to mmj2 should be treated as a goal FWIW.

expln commented 1 year ago

I am not sure what you are asking about. But I have not changed anything since my announcement of the readiness of the full unification on dev. If you are asking about how that happened that the old setting works together with the new one, then I think this is just a coincidence. These settings are "independent" and "don't know each other". But the way how they are implemented produces this overall result.

The setting Type/Color/Prefix has not been changed from the very beginning. When mm-lamp needs to introduce a new variable, it takes the type of this variable, goes to this setting and takes the prefix for this type, and finally concatenates the prefix with an integer so that prefix+int is a symbol not being used (it iterates over an increasing sequence of integers to find the lowest suitable one).

The new setting "Prefix of metavariables in unification" is used only in the new unification algorithm. When I was implementing it, I faced a problem that it worked similar to matching - it was finding substitutions only in one direction. Then I noticed your explanation "all other equality checks should be performed modulo this assignment...". So I implemented "you should rewrite the whole state to replace occurrences of mv with e". This is the place where I had to distinguish "local constants" and metavariables. I decided to distinguish them by name and introduced this setting "Prefix of metavariables in unification". I cannot say that I introduced the concept of metavariables to the entire mm-lamp. The concept of metavariables is used only inside the new function which unifies two expressions.

When that new unify() function finds a substitution, its work ends and the further processing is done by the old code. The old code takes this substitution, performs disjoint checks and type checks. When a user presses "Apply" button, then again the old code which knows nothing about metavariables applies the substitution as a simultaneous substitution of all the variables. The variables not shown in the substitution dialog are substituted with themselves (the same way as it worked prior this last change). For example, ph -> ph, ps -> ps.

By saying "I don't want to change defaults" I didn't mean any big change. I meant that I plan to use the symbol & by default in those two settings. So it feels more familiar to users used to mmj2 and to newcomers who are just starting by reading the documentation and watching the videos (I mean videos for both mm-lamp and mmj2). But experienced users may change those settings and I expect it to work fine too.

david-a-wheeler commented 1 year ago

I implemented the full unification during global substitution (the "substitution" icon), and automatic merging of similar steps. It is available on dev https://expln.github.io/lamp/dev/index.html

Great, this is definitely a step forward.

I tried it out. Starting from this state I can use "search" to add a use of ax-2, select the new step 4 and step 3, click on "substitute", and (after selecting unify mode) unify. I can then use a bottom-up proof on step 2 to complete the proof. So it definitely does work.

I think there are many other opportunities to use the full unification algorithm.

If I start over and search to add an instance of ax-2, I can't select and merge steps 3 and 4 (and cause full unification).

Perhaps more importantly, I can't edit a statement where the justification is known, and cause full unification. That's something many mmj2 users depend on. E.g., after starting over and adding an instance of ax-2, I can't edit the statement in the new step 4, to replace just one &W3 with ph.

E.g., if I change the step 4 statement to this and then unify:

|- ( ( &W3 -> ( &W4 -> &W2 ) ) -> ( ( ph -> &W4 ) -> ( &W3 -> &W2 ) ) )

I see this error:

Could not find a unification for assertion:
|- ( ( ph -> ( ps -> ch ) ) -> ( ( ph -> ps ) -> ( ph -> ch ) ) )
→
|- ( ( &W3 -> ( &W4 -> &W2 ) ) -> ( ( ph -> &W4 ) -> ( &W3 -> &W2 ) ) )

Whereas I expect it to produce:

|- ( ( ph -> ( &W4 -> &W2 ) ) -> ( ( ph -> &W4 ) -> ( ph -> &W2 ) ) )
expln commented 1 year ago

Thank you for testing the new feature.

If I start over and search to add an instance of ax-2, I can't select and merge steps 3 and 4 (and cause full unification).

Merge can only eliminate one of two identical statements. Unification of two statements is done how you described previously - select the new step 4 and step 3, click on "substitute"... If you mean that I can add such feature to the merge button, then I will think on that. But this causes overloading of buttons. The approach with "substitute" dialog is not much harder.

Perhaps more importantly, I can't edit a statement where the justification is known, and cause full unification. That's something many mmj2 users depend on. E.g., after starting over and adding an instance of ax-2, I can't edit the statement in the new step 4, to replace just one &W3 with ph.

This could be achieved with the "substitute" dialog. Click on &W3 so it becomes selected. Open the "substitute" dialog. &W3 will appear in the upper text field. Type whatever you want to substitute it with in the lower text field. And so on. This will lead to the result you expect and this is not much harder.

digama0 commented 1 year ago

This could be achieved with the "substitute" dialog. Click on &W3 so it becomes selected. Open the "substitute" dialog. &W3 will appear in the upper text field. Type whatever you want to substitute it with in the lower text field. And so on. This will lead to the result you expect and this is not much harder.

Yes, but IIUC that was the status quo before this feature. I would consider it an important case of unification that it can handle unifying |- ( ( &W3 -> ( &W4 -> &W2 ) ) -> ( ( ph -> &W4 ) -> ( &W3 -> &W2 ) ) ) against the theorem statement supplied in the justification. In particular, this should not involve identifying that this is just the original statement with one occurrence replaced - it should also be able to handle

|- ( ( &W3 -> ( ps -> &W2 ) ) -> ( ( ph -> &W4 ) -> ( &W3 -> &W2 ) ) )

(two variables replaced) or

|- ( ( &W3 -> ( &W3 -> &W2 ) ) -> ( ( ph -> &W4 ) -> ( &W3 -> &W2 ) ) )

(replacing &W4 with &W3 and &W3 with ph, thus unifying both to ph). The approach of replacing one variable at a time isn't really suitable since you might have determined that the entire subterm ( &W3 -> &W4 ) should be replaced with ( x = y -> ( x + 2 ) = ( y + 2 ) ) (perhaps copied from elsewhere in the proof). The point of unification is to figure out what has to be replaced, using matching is just making the user do the work.

expln commented 1 year ago

I think mm-lamp is already able to handle all these cases. However, it does it in a bit different way. Could you please verify if the below sequence of steps is functionally equal to what you described?

  1. Select a single step.
  2. Click the "Replace button" (the letter A with an arrow below it). The selected statement will appear in the upper text field.
  3. Copy all the text from the upper text field to the lower one. (I can add a button to do this copying for not to erase what the clipboard currently contains).
  4. Modify the statement in the lower text field.
  5. Make sure "Unification" is selected as the algorithm to find substitutions and click the "Find substitution" button (or just press Enter if one of the text fields is currently focused).
  6. If a substitution is found, and it corresponds to your expectations then click the "Apply" button.

You may use this link to load the state to use for the verification (the important part is to stop before bj-0, otherwise some statements will be incorrectly parsed by mm-lamp)

Here are screenshots showing how I reproduced all the tree cases you described:

two variables replaced: image

replacing &W4 with &W3 and &W3 with ph, thus unifying both to ph: image

( &W3 -> &W4 ) should be replaced with ( x = y -> ( x + 2 ) = ( y + 2 ) ): image

digama0 commented 1 year ago

Yes, that is what I mean. The main difference is that mmj2 would infer the "unify what" field from the theorem label, by replacing all the variables in the theorem statement by fresh metavariables.

More generally, I think there is much to like about the way in which unification can be used as a zero-configuration tool. In mmj2, it's literally just one "do everything" button that users are quickly trained to press frequently. The dialog here would require two copy pastes, an edit, and three button presses (to open the dialog, press "find substitution" and then "apply") instead of an edit and a key command "unify all".

expln commented 1 year ago

The main difference is that mmj2 would infer the "unify what" field from the theorem label, by replacing all the variables in the theorem statement by fresh metavariables.

Yes, in mm-lamp one additional step is required - adding an assertion to the editor by clicking the "Search" button.

I am not trying to replace mmj2 by mm-lamp or to compete with it :) But I will try to do such kind of actions more effortless in mm-lamp. Thanks for the feedback!

david-a-wheeler commented 1 year ago

Merge can only eliminate one of two identical statements. Unification of two statements is done how you described previously - select the new step 4 and step 3, click on "substitute"... If you mean that I can add such feature to the merge button, then I will think on that.

Yes, please. If I click on 2 statements and click on merge. I think users would expect the tool to do it if it's possible.

I think there is much to like about the way in which unification can be used as a zero-configuration tool. In mmj2, it's literally just one "do everything" button that users are quickly trained to press frequently.

I agree with @digama0 - unification is a powerful general-purpose algorithm that can do a number of actions automatically, and I'd really like to see it used so that a single change in one place can cause automatic replacements like this.

expln commented 1 year ago

@digama0

Hi Mario,

Can you explain how MM1 builds syntax trees? Currently, I have performance problems with the algorithm I use in mm-lamp. My algorithm of building a syntax tree for an expression, for example s1 s2 s3 ..., is as follows: 1) Try to prove each of the statements until a proof is found: class s1 s2 s3 ... , setvar s1 s2 s3 ... , wff s1 s2 s3 ...; 2) If a proof is found, then transform it to a syntax tree.

Transforming a proof into a syntax tree is a straightforward process because the proof is already a tree. Finding a proof is much more expensive process. I am using the same algorithm as in mm-lamp's bottom-up prover to find syntax proofs, which tries to apply all possible assertions to build a proof bottom-up. If needed I can provide more details on this algorithm, but I think this should be enough to understand why it is not very fast.

This algorithm works sufficiently well for a small amount of statements. But now I want to build syntax trees for all assertions in set.mm, and that will be very slow.

digama0 commented 1 year ago

The process of building syntax trees from expressions is called parsing. There are a few ways you can do it, but the simplest is to put the syntax axioms into a trie and do a general backtracking algorithm. This can perform poorly on highly ambiguous grammars, but set.mm is unambiguous by design, and even though some backtracking is still needed because things might only be resolved after a while, it's not so bad in practice on set.mm.

A trie node has the following data:

The top level of the trie is a map from typecodes to trie nodes. Every path through the tree represents a sequence of constants and variables, and during grammar initialization (which you can do either up front or as you encounter syntax axioms) you travel down the tree and add the syntax axiom. For example, starting from the empty tree:

T: {}

we can insert axiom wi, that is wff ( ph -> ps ), to produce the tree:

T: {'wff': N0}
N0: {const '(': N1}
N1: {var 'wff': N2}
N2: {const '->': N3}
N3: {var 'wff': N4}
N4: {const ')': N5}
N5: {reduce 'wi'}

If we insert wa $a wff ( ph /\ ps ) and wn $a wff -. ph into the tree as well, we get:

T: {'wff': N0}
N0: {const '(': N1, const '-.': N9}
N1: {var 'wff': N2}
N2: {const '->': N3, const '/\': N6}
N3: {var 'wff': N4}
N4: {const ')': N5}
N5: {reduce 'wi'}
N6: {var 'wff': N7}
N7: {const ')': N8}
N8: {reduce 'wa'}
N9: {var 'wff': N10}
N10: {reduce 'wn'}

Having prepared a data structure for parsing, now suppose we have a string of constants and variables we want to parse. Here's a rendering of the algorithm as lean code:

inductive Node where
  | mk (const : String → Option Node)
       (var : List (String × Node))
       (done : Option String)

def Trie := String → Option Node

def Node.empty : Node := ⟨fun _ => none, [], none⟩

inductive Sym where
  | const (_ : String)
  | var (_ : String)

inductive Expr where
  | var (_ : String)
  | app (_ : String) (_ : Array Expr)

-- Given a variable symbol like "v", returns ("setvar", "wv"),
-- i.e. the typecode and the variable hypothesis for the symbol
def getVarHyp : String → String × String := sorry

variable (p : Trie) (input : Array Sym) in
mutual
partial def parseC (q : Trie) (c : String) (pos : Nat) (args : Array Expr)
    (k : Expr → Nat → Option Expr) : Option Expr :=
  parsePT pos args k ((q c).getD .empty)

partial def parseV (pos : Nat)
    (args : Array Expr) (k : Expr → Nat → Option Expr) :
    List (String × Node) → Option Expr
  | [] => none
  | (s', q) :: vs =>
    (parse s' pos fun v f' => parsePT f' (args.push v) k q)
    <|> parseV pos args k vs

partial def parsePT (pos : Nat) (args : Array Expr)
    (k : Expr → Nat → Option Expr) : Node → Option Expr
  | ⟨cs, vs, d⟩ =>
    (match input[pos]? with
    | some (.const c) => parseC cs c (pos + 1) args k
    | _ => none)
    <|> parseV pos args k vs
    <|> (d.bind fun a => k (.app a args) pos)

partial def parse (tc : String) (pos : Nat) (k : Expr → Nat → Option Expr) : Option Expr :=
  (match input[pos]? with
  | some (.var v) =>
    let (tc', v') := getVarHyp v
    if tc == tc' then k (.var v') (pos + 1) else none
  | _ => none)
  <|> parseC p tc pos #[] k
end

def parseFmla (s : String) (t : Trie) (input : Array Sym) : Option Expr :=
  parse t input s 0 fun e pos => if pos == input.size then some e else none

Described in more detail:

Each individual case is fairly straightforward given the data structure; I can explain in more detail if required. This is also a transcription of the algorithm https://github.com/digama0/mm0/blob/master/mm0-hs/src/MM0/FromMM/Parser.hs#L575-L612 , although that one assumes lazy evaluation and so hopefully this one is more suitable to be adapted to your context.

expln commented 1 year ago

Thank you so much, Mario!

I can explain in more detail if required.

Thanks, I need some time to understand this and probably then I will ask additional questions.

david-a-wheeler commented 1 year ago

@digama0 - as always, thanks so much for your help!!

expln commented 1 year ago

@digama0

the simplest is to put the syntax axioms into a trie and do a general backtracking algorithm. This can perform poorly on highly ambiguous grammars, but set.mm is unambiguous by design, and even though some backtracking is still needed because things might only be resolved after a while, it's not so bad in practice on set.mm.

I understood the algorithm. But I have not implemented it yet because it will take some time. I am wondering how much time could this algorithm take to parse all assertion statements in set.mm (only the resulting statements of axioms and theorems without hypotheses). Do you have any statistics from your implementation of how much time it takes to parse set.mm? I've done some improvements to my current algorithm in mm-lamp, so now it takes about 40 seconds to parse set.mm (previously it was about an hour).

digama0 commented 1 year ago

The lower bound of this parsing algorithm is definitely sub-second if done well. I doubt knowing that will be useful to you though.

david-a-wheeler commented 11 months ago

Metamath-lamp has gown into a very useful tool. I personally think this issue is now of the most important issues holding it back. While there is always room for better auto.ation, full unification would automate much (compared to the current state) and is 100% general.

david-a-wheeler commented 11 months ago

Having this would certainly simplify the tutorial! :-)

expln commented 11 months ago

I started working on full unification a few months ago. But it turned out to be much harder than I imagined. I needed to change the parts of the code which I don't want to change (at least for now). So, I postponed this activity. Currently I am implementing automation. My current results in automation are promising, so I want to release it first. After that, I will try to do another attempt to implement full unification. Probably I will use that new automation to implement the full unification.

david-a-wheeler commented 3 weeks ago

I would love to see this. This is extremely general & is itself a form of automation.

I appreciate it's not easy. If there's something I can do to help, let me know.

expln commented 3 weeks ago

I guess I need help in comparing mmj2 behavior with mm-lamp, because I don’t know mmj2 well. I will try to come up with some solution to the problem you started this issue with (when you typed ax-2 and pressed Ctrl-U). When I implement some analogous solution in mm-lamp, I will ask you to come up with another example of what is possible in mmj2 and not possible in mm-lamp.