aengelberg / instagenerate

Reversing instaparse with core.logic
77 stars 4 forks source link

Negative lookahead seems non relational but possible #1

Open zmaril opened 9 years ago

zmaril commented 9 years ago

I've taken a few passes at implementing negative lookahead and thought I should report back on some of the difficulties I've had. My main focus has been modifying the code for positive lookahead to see if it would work for negative lookahead somehow. I was trying to model negative lookahead as the negation of positive lookahead and so I quickly got dragged into the nonrelational parts of core.logic. I'm no expert on either logic programming/constaint logic programming nor how instagenerate has been implemented so I'll try to keep from theorizing about the following:

Interesting definition for negative lookahead

(defmethod combinator-parseo :neg
  [{comb :parser :keys [hide] :as combinator} grammar strings parse-tree remaining-strings remaining-parse-tree]
  (let [comb (assoc comb :hide true)]
    (fresh [fake-remaining-strings]
      (!= strings remaining-strings)
      (!= parse-tree remaining-parse-tree)
      (partial-parseo comb grammar strings parse-tree fake-remaining-strings parse-tree))))

Simple grammar with interesting results:

(def grammar
  (insta/parser
    "S = !('B') ('A' | 'B' | 'C')"))

(run* [input]
  (fresh [abc]
    (instaparseo grammar input
                 [:S abc])))
;; (((B . _0) :- (!= (("A") (_1)))) ((B . _0) :- (!= (("C") (_1)))))

So in some weird sense instagenerate can do negative lookahead but you would have to carefully rip apart all the cons to get the correct results. It seems like if I understand how this all tied together it would be straightforward to pull these results out in the right way but so far it has escaped me.

In addition to this, I tried to project strings and check that the negative lookahead's parsers didn't successfully parse the value of strings. I don't believe I ever got strings to be ground though and so I nevere got anything but exceptions trying this. Also, I tried using nafc but it created similiar output as above that was far more verbose while being effectively the same.

aengelberg commented 9 years ago

Thanks for taking a stab at this. I notice right off the bat that your "!=" statements should actually stay "==". Those statements are intended to state "we're constraining stuff about the input, but we're not actually eating characters while doing so." Thus, the strings == remaining strings, and the parse tree == remaining parse tree. This is why I have "fake remaining strings", so the parser can pretend it's eating characters but in reality it isn't doing any damage to the input. As for the parse tree, I just set :hide to true, and presumably it will already take care of not affecting the parse tree.

Because negative look ahead ALSO doesn't eat characters (or parse tree), it doesn't make sense to me to simply negate those statements. Somehow we have to negate that "partial-parseo" statement instead.

On Monday, October 27, 2014, Citizen notifications@github.com wrote:

I've taken a few passes at implementing negative lookahead and thought I should report back on some of the difficulties I've had. My main focus has been modifying the code for positive lookahead to see if it would work for negative lookahead somehow. I was trying to model negative lookahead as the negation of positive lookahead and so I quickly got dragged into the nonrelational parts of core.logic. I'm no expert on either logic programming/constaint logic programming nor how instagenerate has been implemented so I'll try to keep from theorizing about the following:

Interesting definition for negative lookahead

(defmethod combinator-parseo :neg [{comb :parser :keys [hide] :as combinator} grammar strings parse-tree remaining-strings remaining-parse-tree](let [comb %28assoc comb :hide true%29] %28fresh [fake-remaining-strings] %28!= strings remaining-strings%29 %28!= parse-tree remaining-parse-tree%29 %28partial-parseo comb grammar strings parse-tree fake-remaining-strings parse-tree%29%29))

Simple grammar with interesting results:

(def grammar (insta/parser "S = !('B') ('A' | 'B' | 'C')")) (run* [input](fresh [abc] %28instaparseo grammar input [:S abc]%29));; (((B . _0) :- (!= (("A") (_1)))) ((B . _0) :- (!= (("C") (_1)))))

So in some weird sense instagenerate can do negative lookahead but you would have to carefully rip apart all the cons to get the correct results. It seems like if I understand how this all tied together it would be straightforward to pull these results out in the right way but so far it has escaped me.

In addition to this, I tried to project strings and check that the negative lookahead's parsers didn't successfully parse the value of strings. I don't believe I ever got strings to be ground though and so I nevere got anything but exceptions trying this. Also, I tried using nafc but it created similiar output as above that was far more verbose while being effectively the same.

— Reply to this email directly or view it on GitHub https://github.com/aengelberg/instagenerate/issues/1.

--Alex

zmaril commented 9 years ago

Yeah I understood a little bit better what I had been doing about twenty minutes later. The above code gives the complement of the set of results that we want. If we negate all the rules that I had above, or equivalently negate only the last constraint, then we should get what we are looking for. I'll look more into what's possible on this front then.

zmaril commented 9 years ago

Here is something that doesn't work but has the right spirit:

(defmethod combinator-parseo :neg
  [{comb :parser :keys [hide] :as combinator} grammar strings parse-tree remaining-strings remaining-parse-tree]
  (let [comb (assoc comb :hide true)]
    (fresh [fake-remaining-strings]
      (== strings remaining-strings)
      (== parse-tree remaining-parse-tree)
      (nafc partial-parseo comb grammar strings parse-tree fake-remaining-strings parse-tree))))

nafc stands for "negation as failure constraint", I believe, and should be similiar to what we want. If you look at the source for nafc then you'll see that it is only runnable when all the arguments are ground (line 2692). Additionally, after throwing in some println's, it sems that nafc only checks for runnablitly once. The previous example produces ((\A) (\B) (\C)), which is the same output as if the constraint wasn't run at all. I'm not sure if this isn't a bug or if it's the intended result. At the time it checks, comb and grammar are ground while the rest of the arguments are not. nafc is supposed to be delayed if any of the arguments aren't ground, but I'm not sure if delayed means it will be retried later (no evidence this is the case in the places I've looked so far) or if delayed means forgotten entirely.

Prayer to @swannodette: could you please clarify how nafc should work if the terms supplied to the constraint aren't ground?

aengelberg commented 9 years ago

In my toy examples at the REPL, nafc seemed to work as expected. If the nafc docstring is true, in this case it would never activate because "fake-remaining-strings" is always fresh, so the constraint would never be triggered.

Here is my version, which works sometimes but not always: (defmethod combinator-parseo :neg [{comb :parser :keys [hide] :as combinator} grammar strings parse-tree remaining-strings remaining-parse-tree](let [comb %28assoc comb :hide true%29] %28all %28== strings remaining-strings%29 %28== parse-tree remaining-parse-tree%29 %28let [helper %28fn [strings] %28fresh [fake-remaining-strings] %28partial-parseo comb grammar strings parse-tree fake-remaining-strings parse-tree%29%29%29] %28nafc helper strings%29%29%29))

I'm using a closure there because I'm willing to execute the partial-parseo check as soon as strings is ground. In reality, I would like to short-circuit even earlier (e.g. if strings is '(\a . _0) and I'm doing a negative lookahead on "a") but this is a reasonable compromise as I predict it would work for most use cases.

This case doesn't work: => (run* [in out](instaparseo %28insta/parser) in out)) ([(\a) (:S "a")])

But this works: => (run* [in out](instaparseo %28insta/parser) (seq "a") out)) () Even more surprisingly, this doesn't work: => (run* [in out](instaparseo %28insta/parser) in out) (== in (seq "a"))) ([(\a) (:S "a")])

--Alex

On Mon, Oct 27, 2014 at 3:09 PM, Citizen notifications@github.com wrote:

Here is something that doesn't work but has the right spirit:

(defmethod combinator-parseo :neg [{comb :parser :keys [hide] :as combinator} grammar strings parse-tree remaining-strings remaining-parse-tree](let [comb %28assoc comb :hide true%29] %28fresh [fake-remaining-strings] %28== strings remaining-strings%29 %28== parse-tree remaining-parse-tree%29 %28nafc partial-parseo comb grammar strings parse-tree fake-remaining-strings parse-tree%29%29))

nafc stands for "negation as failure constraint", I believe, and should be similiar to what we want. If you look at the source for nafc https://github.com/clojure/core.logic/blob/fa9451ed57ba9647399e1e6e1d5a723e422d7c6b/src/main/clojure/clojure/core/logic.clj#L2680 then you'll see that it is only runnable when all the arguments are ground (line 2692). Additionally, after throwing in some println's, it sems that nafc only checks for runnablitly once. The previous example produces ((\A) (\B) (\C)), which is the same output as if the constraint wasn't run at all. I'm not sure if this isn't a bug or if it's the intended result. At the time it checks, comb and grammar are ground while the rest of the arguments are not. nafc is supposed to be delayed if any of the arguments aren't ground, but I'm not sure if delayed means it will be retried later (no evidence this is the case in the places I've looked so far) or if delayed means forgotten entirely.

Prayer to @swannodette https://github.com/swannodette: could you please clarify how nafc should work if the terms supplied to the constraint aren't ground?

— Reply to this email directly or view it on GitHub https://github.com/aengelberg/instagenerate/issues/1#issuecomment-60678325 .

zmaril commented 9 years ago

Here's a version of nafc that I've found useful:

(defn -my-nafc
  ([c args]
     (reify
       IConstraintStep
       (-step [this s]
         (reify
           clojure.lang.IFn
           (invoke [_ s]
             (when-not (tramp ((apply c args) s))
               ((remcg this) s)))
           IRunnable
           (-runnable? [_]
             (println "RUNNABLE?")
             (println (map #(ground-term? % s) args))
             (every? #(ground-term? % s) args))))
       IConstraintOp
       (-rator [_]
         `my-nafc)
       (-rands [_]
         (vec (concat [c] args)))
       IReifiableConstraint
       (-reifyc [_ v r s]
         `(my-nafc ~c ~@(-reify s args r)))
       IConstraintWatchedStores
       (-watched-stores [this] #{::subst}))))

(defn my-nafc
  "EXPERIMENTAL: negation as failure constraint. All arguments to the goal c
must be ground. If some argument is not ground the execution of this constraint
will be delayed."
  [c & args]
  (cgoal (-my-nafc c args)))

Nice thinking with the closure. In all the cases that didn't work, the constraint wasn't runnable because strings wasn't ground. This runs though:

(run* [in out]
  (== in (seq "a"))
  (instaparseo (insta/parser "S = !'a' 'a'") in out))

It looks like strings becomes ground via the first constraint and allows the nafc to be evaluated. Interesting that the order of the constraints matters here.

aengelberg commented 9 years ago

I'm surprised that strings was not ground in those cases. Wasn't I setting the value of strings when binding "in" to (seq "a")? I was setting it after the constraint was established, however.

At this point I'm probably exhausting my own expertise of core.logic, since I haven't gotten into the whole "constraint" side of things. I will need further guidance to understand when constraints can trigger. (Maybe it is a known limitation that constraints are doomed if the logic program goes out of scope of the constraint?)

--Ale ​x​ On Tue, Oct 28, 2014 at 7:38 AM, Citizen notifications@github.com wrote:

Here's a version of nafc that I've found useful:

(defn -my-nafc ([c args](reify IConstraintStep %28-step [this s] %28reify clojure.lang.IFn %28invoke [ s] %28when-not %28tramp %28%28apply c args%29 s%29%29 %28%28remcg this%29 s%29%29%29 IRunnable %28-runnable? [] %28println) (println (map #(ground-term? % s) args)) (every? #(ground-term? % s) args)))) IConstraintOp (-rator [] `my-nafc) (-rands [](vec %28concat [c] args%29)) IReifiableConstraint (-reifyc [_ v r s] `(my-nafc ~c ~@(-reify s args r))) IConstraintWatchedStores (-watched-stores [this] #{::subst})))) (defn my-nafc "EXPERIMENTAL: negation as failure constraint. All arguments to the goal cmust be ground. If some argument is not ground the execution of this constraintwill be delayed." [c & args](cgoal %28-my-nafc c args%29))

Nice thinking with the closure. In all the cases that didn't work, the constraint wasn't runnable because strings wasn't ground. This runs though:

(run* [in out](== in %28seq)) (instaparseo (insta/parser "S = !'a' 'a'") in out))

It looks like strings becomes ground via the first constraint and allows the nafc to be evaluated. Interesting that the order of the constraints matters here.

— Reply to this email directly or view it on GitHub https://github.com/aengelberg/instagenerate/issues/1#issuecomment-60765504 .

zmaril commented 9 years ago

I'm still trying to figure the terminology out but I think nafc is non-relational. core.logic is all about creating graphs of equality (relations). By introducing nonequality into the programs, we are no longer purely relational and so we run into situations where the semi-guarantees of relational programming (like the order of the constraints supposedly not changing the results) no longer hold. But yeah, I'm way outside my expertise already so far. I'll see how far I can get with just this for now.