fosskers / cl-transducers

Transducers: Ergonomic, efficient data processing
https://fosskers.github.io/cl-transducers/
GNU Lesser General Public License v3.0
87 stars 4 forks source link

Adding iterators? #8

Open simendsjo opened 3 months ago

simendsjo commented 3 months ago

I experimented with adding iterators (as cl-transducers already defined enumerator) on top of transducers, and it works fine. Is this something you would consider adding to the library or accepting a PR for?

(defparameter i
  (make-generator-iterator
   (comp (map #'1+)
         (filter #'evenp))
   nil
   (ints 0)))

;; first match on second iteration
(next i)
 ; => 1, NIL

;; next match on fourth iteration
(next i)
 ; => 3, NIL

(defparameter s
  (make-generator-iterator
   (comp (map #'1+)
         (filter #'evenp)
         (take 3))
   #'+
   (ints 0)))

(next s)
 ; => 1, NIL

(next s)
 ; => 3, NIL

;; returns sum when done
(next s)
 ; => 12, DONE
simendsjo commented 3 months ago

I got time to experiment with this some more, and although there are plenty of room for improvements, it shows it is possible without much problem.

;; we need to handle both our own state and the reducer state
(defstruct (iter-acc)
  ;; state for the iterator wrapper
  (iter)
  ;; state for the underlying reducer
  (reducer))

(defstruct (iterator (:constructor %make-iterator)
                     (:print-function (lambda (iterator stream depth)
                                     (declare (ignore depth))
                                     (print-unreadable-object (iterator stream :type t :identity t)
                                       (with-slots (iter reducer) (iterator-acc iterator)
                                         (let ((current (if (eq iter *done*)
                                                            iter
                                                            (car iter))))
                                           (format stream "~a (current: ~a)" reducer current)))))))
  ;; wrapped state (iter-acc)
  (acc)
  ;; wrapped reducer
  (f)
  ;; fetching the next value from the source
  (next))

(declaim (ftype (function (t t (function () t)) *) make-iterator))
(defun make-iterator (xform f next)
  (let ((iter-f (lambda (&optional (acc nil a-p) (input nil i-p))
             (cond ((and a-p i-p) (make-iter-acc :reducer (funcall f (iter-acc-reducer acc) input)
                                                 :iter (cl:cons input t)))
                   ((and a-p (not i-p)) (make-iter-acc :reducer (funcall f (iter-acc-reducer acc))
                                                       :iter nil))
                   (t (make-iter-acc :reducer (funcall f)
                                     :iter (cl:cons 'not-started t)))))))
    (%make-iterator :acc (funcall iter-f)
                    :f (funcall xform iter-f)
                    :next next)))

;; the interface for iterators is the same as for transducers
(defgeneric iterator (xform f source))

;; for an iterator, we only need to know how to produce the next source value.
(declaim (ftype (function (generator) (function () t)) generator->next))
(defun generator->next (generator)
  (lambda ()
    (funcall (generator-func generator))))

;; they can all share the same `make-iterator' implementation
(defmethod iterator (xform f (source generator))
  (make-iterator xform f (generator->next source)))

;; note that they must return `*done*' when there are no more input.
;; to handle state, they can e.g. use a let
(declaim (ftype (function (list) (function () t)) list->next))
(defun list->next (list)
  (let ((rest list))
    (lambda ()
      (if (null rest)
          *done*
          (pop rest)))))

(declaim (ftype (function (cl:vector) (function () t)) vector->next))
(defun vector->next (vector)
  (let ((len (length vector))
        (i 0))
    (lambda ()
      (if (eql i len)
          *done*
          (prog1 (aref vector i)
            (incf i))))))

(defmethod iterator (xform f (source list))
  (make-iterator xform f (list->next source)))

(defmethod iterator (xform f (source cl:vector))
  (make-iterator xform f (vector->next source)))

;; alternative representation of a iterator/transducer.
;; it's a general version of the pipe in my earlier pull request
(defmacro pipe* (fn source &rest transducers-and-reducer)
  (let ((transducers (butlast transducers-and-reducer))
        (reducer (car (cl:last transducers-and-reducer))))
    (if (cdr transducers)
        `(,fn (comp ,@transducers) ,reducer ,source)
        `(,fn ,(car transducers) ,reducer ,source))))

;; `iter*' is like pipe in my other pull request
(defmacro iter* (source &rest transducers-and-reducer)
  `(pipe* iterator ,source ,@transducers-and-reducer))

;; an identity reducer, like #'pass for transducers.
;; we cannot use #'for-each as it always returns t
;; maybe useful for transducers too?
(defun pass-reducer (&optional (acc nil a-p) (input nil i-p))
  (cond ((and a-p i-p) input)
        ((and a-p (not i-p)) acc)
        (t nil)))

;; iteration often don't need to transform or reduce, so this is a convenience
;; macro
(defmacro iter (source &rest transducers)
  (if transducers
      `(pipe* iterator ,source ,@transducers #'pass-reducer)
      `(pipe* iterator ,source #'pass #'pass-reducer)))

;; when handling a source input, there are multiple states where `next-1' stops
;; rather than loop to find the next valid value.
(deftype step-state ()
  '(member
    :not-started
    :already-done
    :source-empty
    :transducer-dropped
    :transducer-done
    :reducer-done
    :value-skipped
    :value-produced))

;; steps through a single source input.
;; a combination of "transduce + reduce", e.g. `list-transduce' + `list-reduce',
;; only without looping until it finds a value.
(declaim (ftype (function (iterator) (values step-state t t t t)) next*))
(defun next-1 (iterator)
  (with-slots ((iter-acc acc) (iter-next next) (iter-f f)) iterator
    (with-slots ((acc-iter iter) (acc-reducer reducer)) iter-acc
      (let* ((source-value 'not-computed)
             (new-acc 'not-computed)
             (transduced-value 'not-computed)
             (reduced-value 'not-computed)
             (produced-value 'not-computed))
        (flet ((return-with (state)
                 (return-from next-1 (values state source-value produced-value reduced-value transduced-value))))
          ;; already done?
          (when (eq acc-iter *done*)
            (return-with :already-done))
          (setf source-value (funcall iter-next))
          ;; source empty?
          (when (eq source-value *done*)
            (setf acc-iter *done*)
            (return-with :source-empty))
          ;; process source value
          (setf new-acc (funcall iter-f iter-acc source-value))
          ;; transducer never called the accumulater; i.e. dropped the value
          (when (eq new-acc iter-acc)
            (return-with :transducer-dropped))
          ;; transducer told us it was done
          (when (reduced-p new-acc)
            (let* ((new-acc (reduced-val new-acc))
                   (acc-iter-new (iter-acc-iter new-acc)))
              (setf transduced-value (car acc-iter-new))
              (setf acc-reducer (iter-acc-reducer new-acc)
                    acc-iter *done*)
              (return-with :transducer-done)))
          ;; reducer told us to stop
          (when (reduced-p (iter-acc-reducer new-acc))
            (let ((reducer-acc (reduced-val (iter-acc-reducer new-acc))))
              (setf acc-reducer reducer-acc
                    acc-iter *done*)
              (setf reduced-value reducer-acc)
              (return-with :reducer-done)))
          ;; produced value
          (when (iter-acc-iter new-acc)
            (setf acc-iter (iter-acc-iter new-acc)
                  acc-reducer (iter-acc-reducer new-acc))
            (setf produced-value (car acc-iter))
            (return-with :value-produced))
          (error "BUG: unreachable"))))))

;; steps through a process
;; a combination of "transduce + reduce", e.g. `list-transduce' + `list-reduce',
;; only without looping until the process is done.
(defun next (iterator)
  (labels ((recurse ()
             (multiple-value-bind (state source-value produced-value reduced-value transduced-value) (next-1 iterator)
               (declare (ignore source-value reduced-value transduced-value))
               (with-slots ((acc-iter iter) (acc-reducer reducer)) (iterator-acc iterator)
                 (ecase state
                   (:already-done (values acc-reducer *done* *done*))
                   (:source-empty (values acc-reducer *done* *done*))
                   (:transducer-dropped (recurse))
                   (:transducer-done (values acc-reducer *done* *done*))
                   (:reducer-done (values acc-reducer produced-value *done*))
                   (:value-produced (values acc-reducer produced-value nil))
                   (:value-skipped (recurse)))))))
    (recurse)))

(defun done? (iterator)
  (eq *done* (iter-acc-iter (iterator-acc iterator))))

;; runs `next' until done
(defun drain (iterator)
  (do ((_ nil (next iterator)))
      ((done? iterator) (iter-acc-reducer (iterator-acc iterator)))))

;; construct a reducer which has a side-channel
(defun tee (reducer side-channel)
  (lambda (&optional (acc nil a-p) (input nil i-p))
    (cond ((and a-p i-p)
           (funcall side-channel acc input)
           (funcall reducer acc input))
          ((and a-p (not i-p))
           (funcall side-channel acc)
           (funcall reducer acc))
          (t
           (funcall side-channel)
           (funcall reducer)))))

;; runs a supplied callback for each input
(defun callback (callback &optional (state nil s-p))
  (if s-p
      (lambda (&optional (acc nil a-p) (input nil i-p))
        (cond ((and a-p i-p) (funcall callback input acc))
              ((and a-p (not i-p) acc))
              (t state)))
      (lambda (&optional (acc nil a-p) (input nil i-p))
        (cond ((and a-p i-p) (funcall callback input))
              ((and a-p (not i-p) acc))
              (t state)))))

;; the api for `iterator' is the same as `transduce'
;; the difference is that iterator will not loop until done, but let the user
;; trigger the `next' step
#+nil
(iterator #'pass #'pass-reducer '(1 2 3))
 ; => #<ITERATOR NIL (current: NOT-STARTED) {100DEC04F3}>

;; a transducer is simply an iterator that is drained
;; ... but it has more overhead - probably several low-hanging fruits here
#+nil
(transduce #'pass #'+ '(1 2 3))
 ; => 6 (3 bits, #x6, #o6, #b110)
#+nil
(drain (iterator #'pass #'+ '(1 2 3)))
 ; => 6 (3 bits, #x6, #o6, #b110)

;; like `pipe', we have `iter*'
#+nil
(iterator #'pass #'pass-reducer '(1 2 3))
 ; => #<ITERATOR NIL (current: NOT-STARTED) {100D7278A3}>
#+nil
(iter* '(1 2 3) #'pass #'pass-reducer)
 ; => #<ITERATOR NIL (current: NOT-STARTED) {100DD784F3}>

;; `iter' is a shorthand which passes pass and pass-reducer by default.
#+nil
(iter '(1 2 3))
 ; => #<ITERATOR NIL (current: NOT-STARTED) {100DCE84D3}>

;; `callback' is called for each reduction. Not specific to iterators, and is
;; probably useful for transducers as well.
#+nil
(drain (iter* '(1 2 3) #'pass (callback (lambda (x) (format t "x:~a~%" x)))))
; x:1
; x:2
; x:3
;  => NIL

;; `tee' let's you call multiple reducers. Useful for having a side-channel in
;; addition to the actual computed result. Also not specific to iterators and
;; might be useful in transducers to handle side-effects.
;; The following prints each value, and returns the sum
#+nil
(drain (iter* '(1 2 3) #'pass (tee #'+ (callback (lambda (x) (format t "x:~a~%" x))))))
; x:1
; x:2
; x:3
;  => 6 (3 bits, #x6, #o6, #b110)

;; `next-1' steps a single input from the source and does not loop. Only intended for debugging purposes.
;; the following returns the value 0 even though we're dropping it.
#+nil
(next-1 (iter (ints 0) (drop 10)))
 ; => :TRANSDUCER-DROPPED, 0, NOT-COMPUTED, NOT-COMPUTED, NOT-COMPUTED

;; `next' will continue until a value is computed, so the following returns 10
#+nil
(next (iter (ints 0) (drop 10)))
 ; => 10, 10, NIL
fosskers commented 3 months ago

I wanted to respond to this properly but did not manage to get to it this weekend (I was also out of town but for another reason). My trip starts tomorrow morning, so unfortunately I can't dedicate the time until I get back. Please forgive the delay.

Otherwise, this is likely what we'd need to solve the "zip problem", which makes me excited.

simendsjo commented 3 months ago

Otherwise, this is likely what we'd need to solve the "zip problem", which makes me excited.

We don't actually need a full iterator thing to support this, only iterating the sources.

I extracted the "fetch next value from source" to a regular function per type, and then we can easily build a new one which fetches from each of the sources, short-circuting when one is done.

Notice that this also fixes the "generic concat" issue.

TRANSDUCERS> (transduce #'pass #'cons (multi-iter '(1 2 3) (cl:vector 'a 'b) "foobar"))
((1 A #\f) (2 B #\o))

The design is simple, a source-iter contains a next function which either produce a value or returns *done*. I also have a initialize and finalize to support things like automatically opening and closing files.

I converted the existing code to this pattern without any issues.

Of course, having this greatly simplifies the iterator design too. transduce could use iterator directly, but it does have a little bit overhead. I plan on removing a memory allocation, but it will still be slower than simply looping and reducing in one go without additional structure.

The above code for multi-iter and list and vector iteration:

(defun list-iter (list)
  (let ((rest list))
    (make-source-iter :next (lambda ()
                              (if (null rest)
                                  *done*
                                  (pop rest))))))
(defun vector-iter (vector)
  (let ((len (length vector))
        (i 0))
    (make-source-iter :next (lambda ()
                              (if (eql i len)
                                  *done*
                                  (prog1 (aref vector i)
                                    (incf i)))))))

(defun multi-iter (source &rest more-sources)
  (let ((sources (mapcar #'ensure-source-iter (cl:cons source more-sources))))
    (make-source-iter
     :initialize (lambda ()
                   (dolist (source sources)
                     (funcall (source-iter-initialize source))))
     :finalize (lambda ()
                 (dolist (source sources)
                   (funcall (source-iter-finalize source))))
     :next (lambda ()
             (block next
               (let ((result '()))
                 (dolist (source sources (nreverse result))
                   (let ((value (funcall (source-iter-next source))))
                     (if (eq value *done*)
                         (return-from next *done*)
                         (push value result))))))))))
fosskers commented 1 month ago

Hi, a quick update from me. Now that I'm back, my priorities are:

  1. Finish my current Aura work.
  2. Address this redesign work in another lib of mine
  3. Address these transducers redesigns.

One thing I haven't done is properly setup benchmarks for this repository. The original iteration algorithms were based on the SRFI-171 implementations, which I lifted and adapted to the various CL types. I'd like to know how these two approaches differ in terms of performance. The existing labels approach should be pretty damn fast, but it has the weaknesses we've already identified. Namely that "sources" can't be composed well. The first thing I will do once reaching (3) is setting up said benchmarks.

simendsjo commented 1 month ago

Colin Woodbury @.***> writes:

2 Address this redesign work in another lib of mine

Ouch, that's one big PR ;)

One thing I haven't done is properly setup benchmarks for this repository. The original iteration algorithms were based on the SRFI-171 implementations, which I lifted and adapted to the various CL types. I'd like to know how these two approaches differ in terms of performance. The first thing I will do once reaching (3) is setting up said benchmarks.

The existing code should hopefully have the exact same performance characteristics as before. It could have been refactored to use this new iterators, but that would make it a lot slower. Benchmarks would be nice though. It would be interesting to see just how much slower than reducing the iterators is.

simendsjo commented 1 month ago
3. Address these `transducers` redesigns.

Noticed I hadn't even pushed the code up :/

As I ended up with code some changes, some controversial, I thought it would be best to discuss a bit before I finalized the changes. The changes include my changes, but it's far from clean.

The pipe, for and lazy is the more controversial things that should probably be left out for now. source-iter and iterator is the main required change. Convert comp to a macro was a todo which is probably ok.