bobbicodes / bobbi-lisp

Interactive Lisp environment for learning Clojure
https://bobbicodes.github.io/bobbi-lisp/
0 stars 0 forks source link

Lazy evaluation #13

Open bobbicodes opened 10 months ago

bobbicodes commented 10 months ago

I'm not a huge fan of this feature because it causes an explosion of complexity. So far I've been implementing just certain things that are especially useful, like cycle, iterate, and range , but even those are only partially working. It's annoying because once it exists, absolutely everything else needs to know how to deal with them.

There's a mathematician youtuber I follow named Norman Wildberger who opposes the concept of infinity, and developed an entire pedagogy that refuses to acknowledge it. I find this comforting, and share his sentiment when applied to programming. This might sound rather naive to some people but I'm finding that every time I've rewritten a function because I don't have lazy seqs, I like the result better. Just say what you want! Don't pretend that there are things that don't exist!

The squint compiler seems to have solved this by going all in on js iterables/generators, and I recently spent a whole day trying to see if I could just drop them in and hook them up but it didn't work out so well. Still, it seems like the right approach so I'll walk through it here just to give the basic idea.

There's a lazy-seq macro which initializes a LazySeq object with the body wrapped in a thunk:

(defn core-lazy-seq
  "Takes a body of expressions that returns an ISeq or nil, and yields
  a ISeqable object that will invoke the body only the first time seq
  is called, and will cache the result and return it on all subsequent
  seq calls."
  [_ _ & body]
  `(new cljs.core/LazySeq (fn [] ~@body)))

I'm a bit confused by the [_ _ & body], like what are the first two args being ignored?

It appears that they are &form and &env, which all the macros receive. So we won't worry about that.

Found it, it's this line in the compiler:

(apply macro expr {} (rest expr))

We might need to define new somewhere. That might be the missing piece, when I tried it before I think I made another JavaScript function that would instantiate a new LazySeq.

Kind of like how this works:

class LazyIterable {
  constructor(gen) {
    this.gen = gen;
  }
  [Symbol.iterator]() {
    return this.gen();
  }
}

export function lazy(f) {
  return new LazyIterable(f);
}

It seemed like a decent way to go but it failed... maybe that's why it needs to be a macro, to prevent it from being evaluated? But looking at this makes me believe there's probably a new Clojure function somewhere...

But I think what it's passing the body to is this ES6 class:

export class LazySeq {
  constructor(f) {
    this.f = f;
  }
  *[Symbol.iterator]() {
    yield* this.f();
  }
}

I don't know enough about the JavaScript iteration protocols to know what I'm doing, which is probably why I couldn't get it to work in mine. And I'm actually not entirely sure if I want to use them. For cycle and iterate I ended up just using my own custom objects:

class Iterate {
    constructor(f, x) {
        this.name = 'Iterate'
        this.f = f
        this.realized = [x];
    }
    next() {
        this.realized.push(this.f(this.realized[this.realized.length - 1]))
        return this.realized;
    }
}

function iterate(f, x) {
    return new Iterate(f, x)
}

class Cycle {
    constructor(coll) {
        this.name = 'Cycle'
        this.coll = coll
    }
}

function cycle(coll) {
    return new Cycle(coll)
}

I especially like the way cycle is consumed by take:

if (types._cycle_Q(coll)) {
    const cycles = Math.floor(n / coll.coll.length)
    const mod = n % coll.coll.length
    let res = []
    for (let i = 0; i < cycles; i++) {
        res = res.concat(coll.coll)
    }
    if (mod != 0) {
        res = res.concat(coll.coll.slice(0, mod))
    }
    return res
}

The only time I used iterators/generators is for range, because it happens to be the first example in the MDN docs:

// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_Generators#generator_functions
function* makeRangeIterator(start = 0, end = Infinity, step = 1) {
    let iterationCount = 0;
    for (let i = start; i < end; i += step) {
        iterationCount++;
        yield i;
    }
    return iterationCount;
}

function range(start, end, step) {
    if (arguments.length === 0) {
        // uses above generator
        var iterator = makeRangeIterator()
        iterator.name = 'lazyRange'
        return iterator
    }
...
}

I had to add the line iterator.name = 'lazyRange' because it seems to be the only way to tell if an object is the thing that it is! It was a rather frustrating experience, like isn't that the most basic thing one would need to do?

This allows creation of a predicate function so it can be identified:

export function _lazy_range_Q(x) {
    if (x === null) {
        return false
    }
    if (typeof (x) === "object") {
        return Object.hasOwn(x, 'name') && x.name === 'lazyRange'
    }
    return false
}

It's then consumed by things like take like so:

function take(n, coll) {
    if (types._lazy_range_Q(coll)) {
        return range(0, n)
    }
...
}
bobbicodes commented 10 months ago

I'm finding squint's output quite enlightening. Here is a simple lazy fibonnacci:

(defn fib
  ([]
   (fib 1 1))
  ([a b]
   (lazy-seq (cons a (fib b (+ a b))))))
var fib = (function () {
    let f1 = (function (var_args) {
        let G__45 = arguments["length"];
        switch (G__45) {
            case 0:
                return f1.cljs$core$IFn$_invoke$arity$0();
                break;
            case 2:
                return f1.cljs$core$IFn$_invoke$arity$2((arguments[0]), (arguments[1]));
                break;
            default:
                throw new Error(squint_core.str("Invalid arity: ", squint_core.alength(arguments)))
        }
    });
    f1["cljs$core$IFn$_invoke$arity$0"] = (function () {
        return fib(1, 1);
    });
    f1["cljs$core$IFn$_invoke$arity$2"] = (function (a, b) {
        return new squint_core.LazySeq((function () {
            return squint_core.cons(a, fib(b, (a + b)));
        }));
    });
    f1["cljs$lang$maxFixedArity"] = 2;
    return f1;
})();

I wish I'd seen this before when I was breaking my head on arities, but that's water under the bridge, unless the way I did it turns out to be insufficient. But now I'm thinking that there should be enough here for me to reverse engineer lazy seqs.

bobbicodes commented 10 months ago

I tried adding lazy-seq like the way LazyIterator works, but this is what happens:

(defn fib
  ([]
   (fib 1 1))
  ([a b]
   (lazy-seq (cons a (fib b (+ a b))))))

(take 5 (fib))
=> 
Error: Maximum call stack size exceeded 

It's weird because I wouldn't expect it to even run anything, I thought it would tell me it doesn't know how to print the iterator or whatever it is.

bobbicodes commented 10 months ago

I just merged my attempt at porting the lazy stuff from squint, see #18

TL;DR it didn't work, but I got farther than before:

(defn fib
  ([]
   (fib 1 1))
  ([a b]
   (lazy-seq (cons a (fib b (+ a b))))))

(take 5 (fib))
=> (1 [object Object])

I guess that's an improvement? Hey, at least it returned the 1... the important thing is that it doesn't interfere with anything else which is why I went ahead and merged it.

We now have the new special form:

case "new":
    var constructor = EVAL(a1, env)
    var args = EVAL([types._symbol('do')].concat(ast.slice(2)), env)
    return new constructor(args)

As far as I can tell, it works. It instantiates the LazySeq object with the proper values. But since it doesn't work, there's no way to know until I test it on something else.

bobbicodes commented 10 months ago

Since I'm at a loss at how to proceed with the js iterable path, I think I'll attempt to implement it my own way. The concept is simple, and similar to what I did with iterate and cycle. We already have the lazy-seq macro which wraps the body in a function, so I'll just replace the LazySeq object with something else. Each time a value is yielded it will be cached, I think an array should suffice. We also need a way of knowing whether it is fully realized. And I believe the rest is up to the other things to know how to operate on them.

Here is a summary of some explanations I found:

https://stackoverflow.com/questions/44095400/how-to-understand-clojures-lazy-seq

It walks through the following function:

(defn rep [n]
  (lazy-seq (cons n (rep n))))

The key is to look how `LazySeq.first`` works:

final synchronized Object sval(){
    if(fn != null)
        {
                sv = fn.invoke();
                fn = null;
        }
    if(sv != null)
        return sv;
    return s;
}

final synchronized public ISeq seq(){
    sval();
    if(sv != null)
        {
        Object ls = sv;
        sv = null;
        while(ls instanceof LazySeq)
            {
            ls = ((LazySeq)ls).sval();
            }
        s = RT.seq(ls);
        }
    return s;
}

public Object first(){
    seq();
    if(s == null)
        return null;
    return s.first();
}

It will invoke the wrapped function to obtain and memoize the result. In your case it will be the following code:

(cons n (rep n))

Thus it will be a cons cell with n as its value and another LazySeq instance (result of a recursive call to rep) as its rest part. It will become the realised value of this LazySeq object and first will return the value of the cached cons cell.

public ISeq more(){
    seq();
    if(s == null)
        return PersistentList.EMPTY;
    return s.more();
}

When you call more on it, it will in the same way ensure that the value of the particular LazySeq object is realised (or reused memoized value) and call more on it (in this case more on the cons cell containing another LazySeq object).

Once you obtain another instance of LazySeq object with more the story repeats when you call first on it.

map and take will create another lazy-seq that will call first and more of the collection passed as their argument (just another lazy seq) so it will be similar story. The difference will be only in how the values passed to cons are generated (e.g. calling f to a value obtained by first invoked on the LazySeq value mapped over in map instead of a raw value like n in your rep function).

With reduce it's a bit simpler as it will use loop with first and more to iterate over the input lazy seq and apply the reducing function to produce the final result.

bobbicodes commented 10 months ago

https://stackoverflow.com/questions/22328304/if-the-only-non-stack-consuming-looping-construct-in-clojure-is-recur-how-doe/22328694#22328694

A lazy sequence has the rest of the sequence generating calculation in a thunk. It is not immediately called. As each element (or chunk of elements as the case may be) is requested, a call to the next thunk is made to retrieve the value(s). That thunk may create another thunk to represent the tail of the sequence if it continues. The magic is that (1) these special thunks implement the sequence interface and can transparently be used as such and (2) each thunk is only called once -- its value is cached -- so the realized portion is a sequence of values.

Here it is the general idea without the magic, just good ol' functions:

(defn my-thunk-seq 
  ([] (my-thunk-seq 1)) 
  ([n] (list n #(my-thunk-seq (inc n)))))

(defn my-next [s] ((second s)))

(defn my-realize [s n] 
  (loop [a [], s s, n n] 
    (if (pos? n) 
      (recur (conj a (first s)) (my-next s) (dec n)) 
      a)))

user=> (-> (my-thunk-seq) first)
1
user=> (-> (my-thunk-seq) my-next first)
2
user=> (my-realize (my-thunk-seq) 10)
[1 2 3 4 5 6 7 8 9 10]
user=> (count (my-realize (my-thunk-seq) 100000))
100000 ; Level stack consumption

The magic bits happen inside of clojure.lang.LazySeq defined in Java, but we can actually do the magic directly in Clojure (implementation that follows for example purposes), by implementing the interfaces on a type and using an atom to cache.

(deftype MyLazySeq [thunk-mem]
  clojure.lang.Seqable 
  (seq [_] 
    (if (fn? @thunk-mem) 
      (swap! thunk-mem (fn [f] (seq (f)))))
      @thunk-mem)
  ;Implementing ISeq is necessary because cons calls seq
  ;on anyone who does not, which would force realization.
  clojure.lang.ISeq
  (first [this] (first (seq this)))
  (next [this] (next (seq this)))
  (more [this] (rest (seq this)))
  (cons [this x] (cons x (seq this))))

(defmacro my-lazy-seq [& body] 
  `(MyLazySeq. (atom (fn [] ~@body))))
bobbicodes commented 10 months ago

Seeing how involved it actually is makes me think perhaps it would make more sense to just learn how iterables work, since I'd be essentially reimplementing the same stuff they are designed to do.

But I had another thought, a possible explanation for why I couldn't get the code that works in squint to work in my interpreter. It's because the 2 projects are actually fundamentally different.

In squint, the function passed to the LazySeq constructor is actually a JavaScript function and nothing more. It's already been compiled at that point. But here, while the body of the function is represented in JavaScript, it represents a Clojure AST! In other words, it still needs to be interpreted to be executed. I couldn't hand that function off to someone else and expect them to run it outside of my interpreter, so it's the same with the generator or whatever... it doesn't know how to get from an AST to something that does something... that's the interpreter's job.