Clashing loop environments?

bobbicodes commented 1 year ago

I've long suspected that there is a fundamental problem with the way loop/recur is handled in the interpreter, and if my suspicion is correct, would be responsible for many mysterious bugs.

I believe the issue is that there is a global loop_env variable. At the time I didn't think that it could cause any issue because I didn't consider that there would often be mutually recurring loops, causing the env to be lost and replaced with the one that is currently being interpreted.

So how to solve it? The one thing I can think of is to make the loop env a map, with each key being... what? I'm having trouble wrapping my head around it, even to make certain that it's indeed an issue, much less figure out a solution.

Why do I think it may not be an issue? Because each time an env is created, it stores a reference to the outer env, and the lookup strategy is to recursively climb up the chain of environments, so shouldn't all the locals be retrievable? This makes my brain hurt...

At the time of evaluation, the interpreter is only aware of the current form being evaluated, not the surrounding context. So when we hit recur, how do we ascertain which loop environment is the right one?

Since loops are so ubiquitous, it's critical that I get to the bottom of this, because the interpreter machinery must be 100% sound in order to robustly support everything built on top.

bobbicodes commented 1 year ago

For a concrete example, here is a solution to 4clojure #111, crossword puzzle:

(defn cw [word board]
  (let [across (map #(str/escape % {\space "", \_ \.}) board)
        down (apply map str across)]   ;; transpose the board so down becomes across
    (string? (->> (concat across down)
                  (mapcat #(clojure.string/split % #"#"))
                  (some #(re-matches (re-pattern %) word))))))

It fails on the second line:

(map #(str/escape % {\space "", \_ \.}) ["_ # _ _ e"]) => 
Error: 'index' not found

The index local is from str/escape:

(defn str/escape [s cmap]
  (loop [index 0
         buffer ""]
    (if (= (count s) index)
      buffer
      (if (get cmap (nth s index))
        (recur (inc index) (str buffer (get cmap (nth s index))))
        (recur (inc index) (str buffer (nth s index)))))))

If we remove the map, str/escape works fine on a single string:

(str/escape "_ # _ _ e" {\space "", \_ \.})
=> ". # . . e"

Where I suspect the problem lies is because map also uses a loop:

(defn map1 [f coll]
  (loop [s (seq coll) res []]
    (if (empty? s) res
        (recur (rest s) (conj res (f (first s)))))))

bobbicodes commented 1 year ago

Furthermore, what if hashing the loop ASTs isn't even enough? I could imagine a situation where the same function is being looped with independent local values. So it may not be good enough to know we are looping the correct AST, but the correct instance of that AST. Yes... brain hurty

This is obviously a solved problem, because every programming runtime has loops. But it seems that the only solution is to start digging into how compilers and interpreters handle it. But the problem is it's usually entangled within a bunch of incomprehensible stuff, at least to me.

bobbicodes commented 1 year ago

Here's an idea... we could gensym each local and keep one flat env.

What I should do is contrive a failing test case where one loops calls another.

How about this:

(def colls [[1 2] [3 4]])

(defn loop1 [coll]
  (loop [s coll res []]
    (if (empty? s) res
      (recur (rest s) (conj res (first s))))))

(defn loop2 [colls]
  (loop [s colls res []]
    (if (empty? s) res
      (recur (rest s) (conj res (loop1 (first s)))))))

(loop2 colls)

Indeed, in my interpreter this results in an infinite loop. But in Clojure, it correctly outputs the same vector:

user=> (loop2 colls)
[[1 2] [3 4]]

bobbicodes commented 1 year ago

I made a branch for this: https://github.com/bobbicodes/lisp-tutorial/tree/loop

Using postwalk I've uniquified the local symbol names:

(loop [s__0 colls res__1 []]
  (if (empty? s__0) res__1 
      (recur (rest s__0) (conj res__1 (loop1 (first s__0))))))

Then I think we can simply set the vars in the main env without worry.

bobbicodes commented 1 year ago

So then... how do we know which symbols to redefine at recur? One way might be to attach metadata somehow...

Actually that's a really good idea! Why didn't I think of that earlier?

bobbicodes commented 1 year ago

I've sketched out how this should work... the key word is metadata. It's essentially the same thing it's doing for functions. At loop, we store the AST right on the recur symbol so when it is called, it knows which vars need to be reset and the form to return to. Pretty simple, but I'm struggling to implement it as usual. I don't think there's any problem, I'm just fumbling to find the right combination of calls. It will be well worth it, because then the interpreter will be rock solid.

It's doing something funny that I hadn't anticipated. I was confused why it seems to be binding too many loop variables... it starts out perfectly, but then when it goes back into the inner loop function for the second time... it thinks it's a new loop! How silly of me not to realize that we need to have some way of resuming a loop that is already in execution that we are returning to! So no problem... we'll say, have another property on the loop symbol designating it as started, completed or whatever, so we don't create new variables when it's already mid-execution.

bobbicodes commented 12 months ago

This idea seemed to almost work, except for one crushing fact... it seems that the properties added as metadata don't persist across function calls. Since data does not simply disappear (I hope), what I believe is really the case is that the recur we add meta to is not the same recur that gets called later, causing this idea to implode.

I think I finally solved it by #6 . At least, the contrived example works. If it turns out to be not sufficient, I'll open a new issue. But I certainly hope that I don't have to think about loops for awhile because that was supremely annoying.

The solution:

case "loop":
    var loop_body = [types._symbol('do')].concat(ast.slice(2))
    var loop_env = new Env(env);
    var loopLocals = []
    for (var i = 0; i < a1.length; i += 2) {
        loop_env.set(a1[i], EVAL(a1[i + 1], loop_env));
        loopLocals.push(a1[i], EVAL(a1[i + 1], loop_env))
    }
    ast = loop_body
    env = loop_env
    break
case "recur":
    loopLocals.__isvector__ = true;
    var recurForms = ast.slice(1).flatMap(i => [i, i])
    for (let i = 1; i < recurForms.length; i += 2) {
        let f = EVAL(recurForms[i], loop_env)
        f.__isvector__ = true;
        loopLocals[i] = f
    }
    ast = [types._symbol('loop')].concat([loopLocals, loop_body])
    break

The "trick" is in the last line before the break, ast = [types._symbol('loop')].concat([loopLocals, loop_body]). What it's doing is constructing a new (loop ...) form with the updated bindings vector and shipping it back. As opposed to before when I was only saving the body of the loop, which made it possible to loop with the incorrect bindings.

bobbicodes commented 12 months ago

This obviously doesn't cover functions as a recur target, so that will have to be implemented next.

bobbicodes / bobbi-lisp

Clashing loop environments? #5