drym-org / qi

An embeddable flow-oriented language.
58 stars 12 forks source link

Supporting alternative semantics for flows #116

Open countvajhula opened 7 months ago

countvajhula commented 7 months ago

In standard Qi, flows are functions, and their semantics are function invocation. We've often talked about alternative semantics for flows. Let's look at a few especially interesting ideas, in connection with a reference expression:

(~> (3) (-< _ _) (== add1 sqr) (>< sqrt) +)  ;=> 5

The present issue isn't about implementing these semantics but about structuring Qi so that these distinct semantics could share as much code and interoperate as seamlessly as possible.

For instance, it's likely that we will want to reuse the expander across all dialects of Qi, and only provide distinct compilers implementing the semantics.

Standard Qi

A flow is a function. The inputs to a flow are provided as arguments. The output is produced as return values.

~> produces a new function that runs each flow sequentially, passing the output of the preceding flow to the next one. == produces a new function that runs each flow on a single input independently, but still sequentially, concatenating the outputs in order. -< is like == but passing all inputs to each flow rather than just one. >< is like == but is variadic, independently applying a common flow to each input.

Process Qi

A flow is an abstract "process," such as an OS process or a thread. The inputs to a flow are provided as standard input, or on a designated input queue. The output is generated as standard out, or on a designated output queue.

~> produces a new process that runs each flow sequentially (each a distinct process), waiting for each process to complete before moving on to the next step. == produces a new process that runs each flow on a single input in parallel, "joining" on them (i.e. waiting for them to complete and produce their outputs) and then producing those outputs, concatenated. -< and >< are similar to == in their operation, producing a new process that runs each flow in parallel, passing in the appropriate inputs and joining the outputs.

In the reference example above, there will probably be 11 processes started (one for each flow, counting inner flows distinctly from wrapping flows, and with one separate process for each input mapped under ><).

The main value here is that due to Qi's functional nature, forms like ==, -< and >< naturally identify parallelizable components of the program, and this dialect is able to take advantage of that by allowing the executing platform (e.g. the OS) to manage fulfillment of these independent tasks in whatever way it deems best. These process-oriented semantics may be especially useful for data processing tasks where the cost of each task greatly exceeds the overhead of process management.

A variation on this is if the "process" abstraction is provided not by OS threads or processes, but by a "big data" backend like AWS Lambda, or a queue-and-workers system like RabbitMQ/Celery.

Streaming Qi

A flow is a generator, or more generally, a coroutine. The inputs serve to initialize the generators which may continue to provide values for a long time or forever. It is also possible to provide fresh input to running generators, which seems especially unusual compared to the other two semantics above.

~> does something like generator-map, passing the output of the preceding flow to the next one, one set (of multiple values) at a time. == produces a new generator that independently (but sequentially) does generator-map on the individual values produced by the preceding flow, concatenating the outputs of these generators in order and producing them upon each invocation. -< and >< are similar to == in their operation, producing a new generator that runs each flow independently but still sequentially, passing in the appropriate inputs and concatenating the outputs.

In the reference example, there would be 11 generators created and composed at various points.

Live Qi

A kind of combination of Process Qi and Streaming Qi. A flow is a process, but instead of completing a task and dumping a single set of outputs onto a queue, it runs incrementally, placing (a full set of) outputs on the queue as they are ready. The downstream process operates on each set of values received in its input queue independently. As with Process Qi, processes terminate when their input queue is exhausted, and they have produced all of their outputs.

~> produces a new process that starts child processes for each component flow, connecting their I/O queues in sequence. == produces a new process that starts child processes for each component flow, distributing received values on the input queue individually across the child input queues, waiting for a complete set of outputs to be produced before placing them on the output queue. -< and >< are similar to == in their operation, passing in the appropriate inputs to child flows and joining full sets of outputs before producing them.

Examples from @samdphillips:

;; trivial cat
(~> gen-read-line (effect displayln))

;; trivial grep
(~> gen-read-line
    (when (regexp-match? #px"error" _) _)
    (effect displayln))

This dialect may be especially useful in cases where a big computational task can be done partially and incrementally, such as text processing or neural nets.

Similar to Process Qi, the actual backend could be OS threads/processes or even "realtime" big data frameworks like Storm.

Swapping and Combining Alternatives

Finally, it would be ideal to be able to swap out the semantics by simply changing the require line or the #lang line. It would also be desirable to be able to combine these different semantics on demand. For instance, parallelism on forms like -< and == could be useful even in Standard Qi:

(require qi qi/parallel)

(~> (a-big-dataset) build-indexes (-< (mode parallel) big-computation another-big-computation _) annotate-original-dataset)

Could we indicate alternative semantics at the application level by simply importing a library in this way, without needing duplicate implementations? That is, we don't just want to write another library providing parallel behavior to -< in Standard Qi, but instead reuse the implementation in the "Process Qi" dialect. What would be the most ergonomic way to do this?

Acknowledgements

This has been brought up for discussion at various times, especially by @samdphillips and @dr-neptune.

NoahStoryM commented 7 months ago

Hello everyone! In the past period of time, I have tried to learn and understand category theory more comprehensively. While thinking about the connection between category theory and qi, I have also been thinking about this question: can we give more semantics to flow. Seeing this issue, I would like to share some of my thoughts.

I think qi's flow and category theory's morphism may be the same thing, and when I learned category theory, I made a very common mistake, that is, I thought that morphism must be something that accepts input and produces output. In fact, morphism can be anything, as long as it satisfies some composition rules.

Process Qi, Streaming Qi, and Live Qi are great, but they have one thing in common: they must receive input and then produce output. I thought of some non-mapping examples, hoping to inspire everyone.

Number Qi

A very simple example, used in category theory to explain that morphism does not have to be a mapping. A flow is a number, ~> is addition. For example

(~> 1 2 3 4) ; => (+ 1 2 3 4)

List Qi

A simple example similar to number Qi. A flow is a list, ~> is append. For example

(~> '(1 1 2) '(a b c)) ; => '(1 1 2 a b c)

Matrix Qi

This is one of my favorite examples. The previous two simple examples only have ~>, in this example, we can also define ==*, ==+, -<, >-, ><, <> operators. (In the matrix category, the product object and the sum object are the same object, so the behavior of ==* and ==+ are the same, I just use == uniformly.)

A flow is a matrix, ~> is matrix multiplication. For example

(define a
  ;; 2 × 3
  | 1 2 3 |
  | 4 5 6 |)

(define b
  ;; 3 × 2
  | 9 6 |
  | 8 5 |
  | 7 4 |)

(~> a b)
=>
;; 2 × 2
|  46 28 |
| 118 73 |

(~> b a)
=>
;; 3 × 3
| 33 48 63 |
| 28 41 54 |
| 23 34 45 |

(== a b)
=>
;; 5 × 5
| 1 2 3 0 0 |
| 4 5 6 0 0 |
| 0 0 0 9 6 |
| 0 0 0 8 5 |
| 0 0 0 7 4 |

(-< a (~> a b))
=>
;; 2 × 5
| 1 2 3  46 28 |
| 4 5 6 118 73 |

(>- a (~> b a))
=>
;; 5 × 3
|  1  2  3 |
|  4  5  6 |
| 33 48 63 |
| 28 41 54 |
| 23 34 45 |

(>< x)                                  ; (==* x x ...)
(<> x)                                  ; (==+ x x ...)

Higher-order flow

If the future Qi supports flow with different semantics, then we may consider defining a function that takes an A semantic flow as input and outputs a B semantic flow. Furthermore, we can also regard the function that maps different flows as flow. (a functor in category theory)

For example, every linear transformation can be represented by a matrix, we can establish a linear transformation to matrix mapping, so we can define the mapping between standard flow and matrix flow:

(define F #;matrix->linear-transformation (λ (m) (λ (v) (matrix* m v))))
(define G #;linear-transformation->matrix (λ (l) (l identity-matrix)))

;; ~>  : standard qi
;; ~>ℳ : matrix qi
(eq=? (F (~>ℳ m1 m2)) (~>  (F m1) (F m2)))
(eq=? (G (~>  l1 l2)) (~>ℳ (G l1) (G l2)))

We can also establish a mapping between standard flow and list flow, and regard this mapping as an action of a finite state machine: https://racket.discourse.group/t/functor-natural-tranformation-and-monad-in-racket/1930/6

In category theory, we can get natural transformations from functors. Further extension, we get 2-categories, 2-morphisms. By analogy to Qi, we can define 2-flow and higher-order flow.

typed qi

From the computing science point of view, category theory is a very strongly typed language, more strongly typed than any computer language. - Category Theory for Computing Science

In category theory, every morphism has a domain and a codomain, and only when cod(f) and dom(g) are the same object, can g∘f exist, so from this point of view, category theory can be regarded as a strongly typed language. Maybe we can refer to this property to design typed qi.

hardware description qi (Inspiration from category variants)

There are many variants of categories in category theory, such as multicategory (http://www.youtube.com/watch?v=D_pPNgGZYDs), whose morphism has only one object in its codomain, but can have multiple objects in its domain, so it is very suitable for studying circuits. Maybe we can analogize multicategory to implement qi similar to hardware description language.

Off-topic

Elegance and usefulness often meet unexpectedly. – by Mickaël Launay.

In the past year, because of some problems in my life, I had a hard time concentrating on GitHub, so I didn’t contribute much to the development of Qi. Fortunately, in this year, I had a lot of elegant ideas about the connection between programming, Qi, and category theory, and this reply is a small part of these ideas. My mathematical knowledge is not very good, and I haven’t systematically learned pure functional languages like Haskell, so I am now trying to sort out these ideas, planning to write a series of articles, using Racket as an example to explain category theory in programming, hoping that it will be helpful to the development of Qi in the future!

rocketnia commented 7 months ago

This topic, especially the direction @NoahStoryM's taking, reminds me of "Compiling to Categories" (https://github.com/compiling-to-categories/concat), which I think is about compiling Haskell expressions with lambdas and function calls so that all those functional operations are actually manipulating morphisms of some other Cartesian closed category. The impression I get of the Compiling to Categories work is that it's kind of a natural follow-up from the Haskell community's use of special notations to denote instances of the Monad, Applicative, and Arrow computation types. At the same time, it can be especially ambitious compared to those other notations because while an existing Haskell computation of type a -> b can be converted to a Monad, Applicative, or Arrow computation using a single conversion, the general case of using existing Haskell code as a specification of a Cartesian closed category morphism might involve actually recompiling the function bodies and type definitions it indirectly depends on, some of which may bottom out at built-ins that need whole new implementations for each category we want to use them in.

I've had Compilation to Categories in mind as a goal for the languages I tinker on, but not usually as top-of-mind a goal as compilation to specific executable targets like JavaScript. Compiling to multiple executable targets exhibits the same code reuse difficulties; not all built-in abstractions necessarily have an easy corresponding implementation on every platform.

Anyway, traversing through module boundaries to recompile things might not be needed for Qi. Qi's standard semantics, at least, is a lot like using Haskell's Arrow operations directly (without the arrow notation): Where a Qi flow uses ~>, ==, and -<, a Haskell program can use >>>, ***, and &&& respectively. So some kinds of alternative semantics, especially the ones @countvajhula listed, seem possible to express just by creating some kind of generic abstraction like the Arrow type class.

Finally, it would be ideal to be able to swap out the semantics by simply changing the require line or the #lang line.

I came up with a plan something like this for Lathe Morphisms a while back. When I thought about representing category theory concepts in Racket, I was finding it very easy to go in circles with generalization, wanting various different choices of foundational abstractions all at once. I decided I would eventually pursue several library plans at once, each one representing different sweet spots and located in a different collection of modules, with intuitive drop-in compatibilities between their data types wherever possible. If there was a new way I wanted to generalize from them, that new generalization could constitute its own additional variant, still compatible with the others. The introduction of new variants could make old ones seem obsolete sometimes, but people could have differences of opinion about which ones were the real sweet spots and which ones were obsolete.

A few other notes on structuring Qi this way:

Actually, @countvajhula, a lot of what you're describing sound like alternative types of "consume arguments and produce results" protocols. A lot of them sound rather seamlessly compatible in composition with each other; some of them just don't take advantage of all the flexibility available to them. What if a lot of flows were just plain procedures that consumed a list of arguments and produced a list of results, but other flows were instances of generic interfaces that indicated to Qi that they follow a different protocol? (This could be compile-time information in their transformer binding, if you don't want the cost of dispatching at run time.) Then you could have various protocols interact with each other with implicit conversions between them, or you could enforce explicit conversions if you prefer.

Streaming Qi

A flow is a generator, or more generally, a coroutine.

I think you'll want to see the Pipes library for Haskell if you haven't already. It has a tutorial with snazzy ASCII art diagrams. In that library, a Pipe is like an intermediary from a standalone argument and a coroutine interaction to a coroutine interaction and a final standalone result. For instance, it could implement one side effect (upon request from its downstream interaction) in terms of another effect (which it requests from its upstream interaction). The standalone input and output values are to allow sequential composition of interactions with state maintained in between; they're like the carry bits that get propagated into one side of a full adder and propagated out the other side.