String interpolation for HOL

loewenheim commented 8 years ago

The Users' Guide currently mentions that hol formulas can be parsed in a similar manner as fol formulas, via a function parseLLKFormula. Is it worthwile to implement a string interpolation version of this function (say ph)?

quicquid commented 8 years ago

I think it depends if we find the llk formula syntax(a generalization of the prover9 syntax) convenient enough to make it the default. At the moment, the type declarations need to be always there, but I could also produce a variant which takes an implicit signature. When writing a proof script, one would do something like:

implicit val signature = sig"var x : (o>o)>o; var y :o;"
val f = ph"x( (\\ y => -y) )"

with ph using a parsing function similar to parse(formula : String)(implicit sig : Signature). Internally, this is already handled like that, i just was avoiding implicits so far and just appended the formula parser to the signature parser. I think the main drawback of the parser is that two bound variables x need to have the same type - don't know how much of a restriction that is.

quicquid commented 8 years ago

I'm just testing a modification of the code which I will push later that evening:

gapt> implicit val sig = LLKFormulaParser.ps("var x : o;")
sig: at.logic.gapt.formats.llkNew.LLKTypes.LLKSignature = Map(x -> x)

gapt> val f = LLKFormulaParser.pf("x")
f: at.logic.gapt.expr.LambdaExpression = x

gapt> val g = LLKFormulaParser.pf("x&x")
g: at.logic.gapt.expr.LambdaExpression = (x∧x)

If you find this convenient to use in the tactics language, I'd be happy about an adaption of the interpolation code.

gebner commented 8 years ago

FWIW, I'm currently looking into an input language with type inference. That should make it possible to omit types almost everywhere and also use the same parser for FOL and HOL. (I have an AST and type inference, its mostly the parser that's missing.)

As for the name, I don't think ph (short for Parse HOL?) is a good choice. Similar to FOL, we need at least different interpolators for formulas and expressions (variables might also be nice).

I'm not a big fan of the p (for parse) either, parsing is kind of expected when providing a string--IMHO the name should just indicate the type of object. My proposal would be to call them fof, fot, foa, fov, hof, hoe, hoa, hov (first-order/higher-order formula/term/atom/variable/expression).

quicquid commented 8 years ago

The LLK AST is basically untyped lambda calculus - I'm on TU next Tuesday, should we sit together and see if we can merge the LLK parser with your AST?

gebner commented 8 years ago

So, driven by a bit of not-invented-here syndrome and the desire to try out a new parsing library (fastparse, this is supposed to be the spiritual successor to the Scala parser combinator library), I have done a quick parser loosely based on and inspired by the LLK syntax, the current state is in the babel branch.

Much fewer parentheses are needed. This is something that has also bugged me with the prover9 parser--it wants parens in various places, and I can never tell where. Compare (\x => (all y exists z x(y, z))) with ^x !y ?z x(y,z) or even ^x !y ?z x y z.
I have added abbreviations for all (!) and exists (?) with inspiration from TPTP: !x?y p(x) -> p(y)
The character ^ for lambda is chosen so that you can type it in a string without needing escape characters: compare "(\\x => x)" with "^x x"
Types can be annotated ("ascripted") pretty much everywhere: p(x:o, y), ?(x:o) x, !x (?y p(x,y) : o)
(not implemented yet) LLK-style variable declarations could also be added everywhere: p(const p : i > i; x)
Some silly but potentially useful abbreviations are supported, like a=b=c or a <= b < c = d.
Implication chains are supported: a -> b -> c -> d -> e, a <-> b <-> c works as well.
Pretty much all of the (first-order) prover9 syntax is supported (except <- and v, and there are differences in constant/variable handling)
Safe way to include arbitrary unicode identifiers: '\u2200' (\x p(x)) is the same as !x p(x) :smile: (again, the single quotes were chosen not to interfere with Scala's double quotes)
Most of the LLK syntax is supported, at least I tried with most of the input in LLKASTParserTest. The big missing thing is identifiers that contain backslashes as they collide with lambdas: \delta delta(abc) is parsed as (λdelta:(i -> i).delta(abc))

So much for the parser, now to the AST and type inference code:

Variables and constants are parsed as identifiers first, the decision whether they are variables or constants is only made at the end in the conversion to LambdaExpression. The following heuristic is used: bound identifiers get mapped to variables, free identifiers that match [u-zU-Z].* are variables, everything else is a constant.
Identifiers with the same name in the same scope are required to have the same types (except =, ∀, and ∃), i.e. p (!(x:i) q(x)) (!x:o) x) or !(x:i) !(x:o) x is fine, but !x (x:i>o) (x:i) is not.
Types default to i if they are not constrained otherwise--this way we get first-order formulas without annotations.

I think it would be nice and a good idea to have a uniform textual format for FOL/HOL expressions and formulas. In particular we might finally get a string output that is both human-readable and can be parsed, so that you can finally copy-and-paste command line output into unit tests (I'm really missing this feature). I think it should even be feasible to make the default toString output both human and machine readable.

As for the way forward, if we decide to make the current LLK implementation the default (for both HOL and FOL of course), then we should add type inference and IMHO also the abbreviations for quantifiers described above and less parens to LLK. They make formulas much more readable, at least to me.

If we make the new code the default, then the question is what happens to LLK. It would be really nice if we would have just one parser, and Babel is almost there except for the backslashes in identifiers.

loewenheim commented 8 years ago

That looks very cool. I really appreciate the conciseness compared to the prover9 parser. I have a minor question though: I thought that the convention for operator precedence was that quantifiers bind more strongly than binary connectives, e.g. ∃x P(x) ∨ Q(x) should be read as (∃x P(x)) ∨ Q(x). The parser currently interprets it as ∃x (P(x) ∨ Q(x)). Am I misremembering this?

Also I personally think it would be neat if the parser could handle the symbols ¬ ∨ ∧ → ∃ ∀ λ directly, but I realize that the utility of that feature is pretty limited to most users ;)

gebner commented 8 years ago

I don't really know what the operator precedence in the prover9 parser is, in Coq I think quantifiers bind only relatively weakly--that's where I was going from. But I'm completely open to changing the operator precedences.

Also I personally think it would be neat if the parser could handle the symbols ¬ ∨ ∧ → ∃ ∀ λ directly, but I realize that the utility of that feature is pretty limited to most users ;)

There is actually a real use case here: if we support Unicode operators in the parser, we can output the in the .toString method and still parse it. :smile:

shetzl commented 8 years ago

I agree that ∃x P(x) ∨ Q(x) should be read as (∃x P(x)) ∨ Q(x). This is the standard in the literature. Parsing the unicode characters would be nice.

gebner commented 8 years ago

Just a few things I forgot; if you want to try it out I have added string interpolators:

scala> hof"!x ?y p(x,y)"
res0: at.logic.gapt.expr.HOLFormula = ∀x.∃y.p(x,y)

However, I'm somewhat embarrassed to admit that I don't really know how to implement unification with good error messages, see what you get when the expression is not typeable:

scala> hof"x & x y"
java.lang.IllegalArgumentException: Cannot type-check ((((∧:(o>(o>o))) (x:_2cf2f9d3)) ((x:_5dda766) (y:_213fb0b))):o):
Cannot unify types o and (_213fb0b>_58163152) in
_61f7650e = (o>o)
_5eac5b25 = _2cf2f9d3
_14cb056 = (o>(o>o))
_3994b18b = _213fb0b
_2cf2f9d3 = o
_5dda766 = o
o = (_213fb0b>_58163152)

  at at.logic.gapt.expr.package$P9Helper$$anonfun$hof$extension$1.apply(package.scala:230)
  at at.logic.gapt.expr.package$P9Helper$$anonfun$hof$extension$1.apply(package.scala:230)
  at scalaz.$bslash$div.fold(Either.scala:57)
  at at.logic.gapt.expr.package$P9Helper$.hof$extension(package.scala:229)
  ... 43 elided

Martin: I'm here on Tuesday and have time if you want to meet up.

quicquid commented 8 years ago

@gebner the new parser is impressive! it's definitely easier to use.

Regarding prover 9: I agree, the prover 9 syntax is not very concise - perhaps I'll document the rules in our manual and not only refer to the prover9 homepage(https://www.cs.unm.edu/~mccune/prover9/manual/2009-11A/syntax.html). Basically it is the following:

negation binds the strongest
quantifiers bind stronger than logical operators - unless the argument of a quantifier is not an atom, you need parenthesis (all x (P(x) & Q(x)))
quantifier blocks are always in parenthesis (all x all y exists z P(x,y,z))
conjunction and disjunction are right-associative : the prover9 does not specify if it is right- or left since there it does not matter. if i understand correctly, the logical operators usually associate to the right
=, !=, <=, <, >, >=, +, *, -, / are binary operators. The manual also mentions ^, ', @, \ which are not supported by the parser. In fear of collisions I did not use ^ and @ in LLK, but I think we don't process that many prover 9 files directly that this is a real problem.

gebner commented 8 years ago

For comparison, the situation in the new babel format:

Quantifiers and negation now bind equally strong, and stronger than conjunction.
Quantifier blocks do not need parentheses.
Conjunction and disjunction are left-associative, just as the built-in And and Or constructors are.
=, !=, <=, >, >=, < are binary relational symbols (=, != are special cased to have the correct types)
+, -, *, / are binary operators and bind more strongly than the relational symbols.

The manual also mentions ^, ', @, \ which are not supported by the parser. In fear of collisions I did not use ^ and @ in LLK, but I think we don't process that many prover 9 files directly that this is a real problem.

Is there a reason they are missing from the prover9 parser as well (at least some them), there really shouldn't be any collisions there? But you're right, I think the only binary operator I had to add for the prover9 TSTP import was the infix v.

gebner commented 8 years ago

Ah, top and bottom are also missing at the moment. The unicode symbols are clear, but are there any good suggestions for the ascii representation? I find neither the prover9 style ($T, $F) nor the TPTP style ($true, $false) particularly appealing.

quicquid commented 8 years ago

The idea for LLK formulas is that you can have simple Latex labels - since ^ means superscript and @ and \ are special symbols which need to be avoided anyway, I removed them.

gebner commented 8 years ago

I'm not really clear on the timeline of the parsers--did the prover9 parser originate from the LLK parser? Why would it inherit this limitation otherwise?

quicquid commented 8 years ago

Ah no, the prover 9 parser was first, but if i remember correctly, I added the infix operators later.

quicquid commented 8 years ago

By the way: what about true and false as reserved words?

gebner commented 8 years ago

I'd really like to avoid keywords if possible as they can clash with constant names--I think there are several examples in the TPTP which use the constant true.

gebner commented 8 years ago

Just a heads up, backslashes do not need to be escaped in string interpolators, but IntelliJ highlights them as an error anyways...:

scala> e"\x '\u22a4'"
res12: at.logic.gapt.expr.LambdaExpression = (λx.⊤)

So the ^ workaround is maybe not necessary.

loewenheim commented 8 years ago

By the way, if we want output to be parsable, then I think ⊃ needs to be recognized as an implication symbol.

gebner commented 8 years ago

@loewenheim Oops, yeah I just copy-and-pasted the symbols in your comment.

loewenheim commented 8 years ago

Not your fault, I didn't think of it until much later either.

gebner commented 8 years ago

So, I've just added a rudimentary pretty-printer:

scala> hof"?x p(x:nat,z,z:o)"
res7: at.logic.gapt.expr.HOLFormula = ?(x:nat) p(x, z:o, z)

scala> BussTautology(4).toFormula
res6: at.logic.gapt.expr.HOLFormula =
-(((c_1 | d_1) & (c_2 | d_2) & (c_3 | d_3) -> c_4) | ((c_1 | d_1) & (c_2 | d_2) &
        (c_3 | d_3) -> d_4)) | -(((c_1 | d_1) & (c_2 | d_2) -> c_3) | ((c_1 |
            d_1) & (c_2 | d_2) -> d_3)) | -((c_1 | d_1 -> c_2) | (c_1 | d_1 ->
        d_2)) | -(c_1 | d_1) | c_4 | d_4

scala> hof"a&b&c&d&e&f|c&d"
res9: at.logic.gapt.expr.HOLFormula = a & b & c & d & e & f | c & d

Yes, the line-wrapping is a bit ugly..

loewenheim commented 8 years ago

To be honest I don't understand how that is better than the old output.

gebner commented 8 years ago

The line-wrapping? I don't know either. But the awesome new feature is the following:

hof"""
-(((c_1 | d_1) & (c_2 | d_2) & (c_3 | d_3) -> c_4) | ((c_1 | d_1) & (c_2 | d_2) &
        (c_3 | d_3) -> d_4)) | -(((c_1 | d_1) & (c_2 | d_2) -> c_3) | ((c_1 |
            d_1) & (c_2 | d_2) -> d_3)) | -((c_1 | d_1 -> c_2) | (c_1 | d_1 ->
        d_2)) | -(c_1 | d_1) | c_4 | d_4
"""

The output is actually copy-and-pastable on a large class of formulas.

loewenheim commented 8 years ago

As of 647ce9852fff7a8e9c09451f4b2ed0b48ecfe7b2 the string interpolators are mostly what Gabriel proposed earlier: there are e, hof, hoa, hov, hoc, foe, fof, foa, fot, fov, foc. I'm not convinced that the interpolators for variables and constants are that useful, the way they currently work:

scala> fov"a"
java.lang.IllegalArgumentException: Term a cannot be read as a FOL variable. Parse it with fot.

Also I think that it might be a good idea to rename e to either le or hoe.

@gebner That is indeed a very nice feature. The part I don't understand is why the logical symbols have been replaced. I think formulas are easier to read with the proper mathematical notation.

gebner commented 8 years ago

The part I don't understand is why the logical symbols have been replaced.

Because I can't type them and didn't include them in the initial prototype. :smile:

java.lang.IllegalArgumentException: Term a cannot be read as a FOL variable. Parse it with fot.

The story for what is considered a variable and what is a constant is not yet very well thought out. For the record, fov"x" works just fine now. It's possible to tell the AST conversion to consider all identifiers as constants so that fov"a" works.

Also I think that it might be a good idea to rename e to either le or hoe.

:+1: on that, e seems kind of small and out of place next to hof.

loewenheim commented 8 years ago

I initially thought that replacing the unicode output was the point of your change, that's why I was confused. Definitely :+1: now.

As for le and hoe, which one should we choose? hoe fits better with the other interpolators, but le cleaves closer to the return type, which is LambdaExpression.

gebner commented 8 years ago

I'd rather tend to hot, which evokes a nicer connotation than hoe. I'm fine with le as well.

loewenheim commented 8 years ago

I'm for le then.

loewenheim commented 8 years ago

As for non-Unicode Top() and Bottom() symbols, how about Top and Bot? I'm not fond of keywords either, but I can't for the life of me think of good symbols we can use.

gebner commented 8 years ago

If we go for keywords, we should probably go directly for true and false since there really seems to be no obvious alternative here. I guess we'll just have to escape the constant true then: theorem('true') should be fine.

gebner commented 8 years ago

scala> hof"true | false <-> theorem('true')"
res0: at.logic.gapt.expr.HOLFormula = (⊤ ∨ ⊥ ⊃ theorem('true')) ∧ (theorem('true') ⊃ ⊤ ∨ ⊥)

That's it for today, good night everyone.

gebner commented 8 years ago

Hah, I think I'm getting the hang of this pretty printing stuff. That's how the debugging output from the nTapeTest looks right now:

∀x ∀y (x < y ∨ (x = y ∨ y < x)) ∧ ∀x ¬x < 0 ∧
(∀x ∀y (x + 1 < y + 1 ⊃ x < y) ∧ ∀x ∀y (x + 1 = y + 1 ⊃ x = y)) ∧
(∀x 0 * x = 0 ∧ ∀x ∀y ∀z (x < y ∧ y < z ⊃ x < z) ∧
    (∀x bm(0, x) = 0 ∧ ∀x ∀y (x < y ⊃ ∃k x + k + 1 = y))) ∧
(∀x ∀y (x = y ⊃ x + 1 = y + 1) ∧ ∀x ∀y ∀z (x < y ⊃ x + z < y + z) ∧
    (∀x ∀z ∀y (x + z < y + z ⊃ x < y) ∧ ∀x x + 0 = x) ∧
    (∀P (¬P ⊃ be(P) = 0) ∧ ∀x bm(x, 0) = x ∧
        (∀x ∀y ∀z x * y * z = x * y * z ∧ ∀x ∀y (x < y ⊃ x + 1 < y + 1)))) ∧
(∀x ∀y x + y = y + x ∧ ∀x ∀y ∀z (x = y ⊃ x + z = y + z) ∧
    (∀x (x = 0 ∨ ∃y y < x) ∧ ∀x ¬x + 1 = 0) ∧
    (∀Y (Y(0) ∧ ∀n (Y(n) ⊃ Y(n + 1)) ⊃ ∀n Y(n)) ∧ ∀x 0 < x + 1 ∧
        (∀x ∀y bm(x + 1, y + 1) = bm(x, y) ∧
            ∀x ∀y (x < y + 1 ⊃ x < y ∨ x = y))) ∧
    (∀x ∀y (x + 1) * y = x * y + y ∧ ∀x ¬x < x ∧
        (∀x ∀y ∀z x + y + z = x + y + z ∧ ∀x ∀y (x = y ∨ x < y ⊃ x < y + 1)) ∧
        (∀P (P ⊃ be(P) = 1) ∧ ∀x (x = 0 ∨ ∃k x = k + 1) ∧
            (∀x 0 + x = x ∧ ∀x ∀y x * y = y * x)))),
∀x (f(x) = 0 ∨ f(x) = 1)
:-
∀n
∃h
(∀i ∀j (i < n + 1 ∧ (j < n + 1 ∧ i < j) ⊃ h(i) < h(j)) ∧
    ∃s ∀i (i < n + 1 ⊃ f(h(i)) = s))
∀x (f(x) = 0 ∨ f(x) = 1),
∀x ∀y (x < y ∨ (x = y ∨ y < x)) ∧ ∀x ¬x < 0 ∧
(∀x ∀y (x + 1 < y + 1 ⊃ x < y) ∧ ∀x ∀y (x + 1 = y + 1 ⊃ x = y)) ∧
(∀x 0 * x = 0 ∧ ∀x ∀y ∀z (x < y ∧ y < z ⊃ x < z) ∧
    (∀x bm(0, x) = 0 ∧ ∀x ∀y (x < y ⊃ ∃k x + k + 1 = y))) ∧
(∀x ∀y (x = y ⊃ x + 1 = y + 1) ∧ ∀x ∀y ∀z (x < y ⊃ x + z < y + z) ∧
    (∀x ∀z ∀y (x + z < y + z ⊃ x < y) ∧ ∀x x + 0 = x) ∧
    (∀P (¬P ⊃ be(P) = 0) ∧ ∀x bm(x, 0) = x ∧
        (∀x ∀y ∀z x * y * z = x * y * z ∧ ∀x ∀y (x < y ⊃ x + 1 < y + 1)))) ∧
(∀x ∀y x + y = y + x ∧ ∀x ∀y ∀z (x = y ⊃ x + z = y + z) ∧
    (∀x (x = 0 ∨ ∃y y < x) ∧ ∀x ¬x + 1 = 0) ∧
    (∀Y (Y(0) ∧ ∀n (Y(n) ⊃ Y(n + 1)) ⊃ ∀n Y(n)) ∧ ∀x 0 < x + 1 ∧
        (∀x ∀y bm(x + 1, y + 1) = bm(x, y) ∧
            ∀x ∀y (x < y + 1 ⊃ x < y ∨ x = y))) ∧
    (∀x ∀y (x + 1) * y = x * y + y ∧ ∀x ¬x < x ∧
        (∀x ∀y ∀z x + y + z = x + y + z ∧ ∀x ∀y (x = y ∨ x < y ⊃ x < y + 1)) ∧
        (∀P (P ⊃ be(P) = 1) ∧ ∀x (x = 0 ∨ ∃k x = k + 1) ∧
            (∀x 0 + x = x ∧ ∀x ∀y x * y = y * x))))
:-
∀n
∃h
(∀i ∀j (i < n + 1 ∧ (j < n + 1 ∧ i < j) ⊃ h(i) < h(j)) ∧
    ∃s ∀i (i < n + 1 ⊃ f(h(i)) = s))

Not a single type ascription in sight! :grinning:

(It's not intentional that some of the quantifiers are on their own line, I just haven't figured out how to disable that. Also this debugging output should probably be disabled.)

gapt / gapt

String interpolation for HOL #481