Open MikeInnes opened 8 years ago
I like the plan. Would it make sense to do a quick prototype of all this to see how it works with the ALS and logistic regression - which are the largest problems we have working with CFW at the moment? They tick off all the boxes, and are small enough to meaningfully reason about.
Also, they make for great demos.
I think that kind of demo would be great for the API work but I'm not sure it makes as much sense for the ideas discussed here.
In order to reason about parallel composition, we need test problems which (a) are composed of multiple smaller parallel problems/algorithms and/or (b) will benefit from very different runtimes/optimisations cooperating together.
Given that this is a reasonably ambitious plan (ok, understatement...) I'm thinking the best short-term approach is to keep these ideas the back of our minds while designing CF and similar libraries, then beginning to coax out general solutions from them. (I think this is where @shashi is at the moment as well, although he may well have a better idea.)
The more I think about it the more I think a great "IR" for array code is going to be some notation like in TensorOperations.jl (tracked in #45) it allows parallelism, talks at a higher level than different hardware (e.g. CPU, GPU) and software libs (e.g. can still call BLAS by dispatching on a subexpression).
I like the idea. About the DAG workflow, I think it fits the concept of concurrency, rather than parallelism. I was inspired by a Golang talk Rob Pike - 'Concurrency Is Not Parallelism'.
Me and @shashi have discussed how we might use a graph-program-format as a kind of parallel IR.
The idea is to have an hourglass design; the DAG is a small core abstraction, above it you build parallel programs, libraries, algorithms, below it you build custom optimisation passes, integrate with other backends and hardware and so on. If things are loosely coupled enough then the problem statement and its implementation become orthogonal; everything is very plug-and-play, and you start building a parallel computing ecosystem rather than just a bunch of apps on top of a single framework.
From 10,000 feet, the building blocks are:
Flow.jl is a start on (1). Shashi has had (3) in mind while designing CF. (2) and (4) involve a lot of tough design decisions that we haven't figured out yet. For example, how do we get optimisation passes or schedulers to cooperate without explicitly knowing about each other? (5) will take a while.
Of course, it's absolutely key that the DAG is expressive enough to represent a useful subset of parallel programs. Dynamic sizing is enough to express recursive / irregular problems. What about other abstractions, like distributed channels – how do they fit in? Do they need to?
To reason about this I think it may be useful to evaluate the dimensions along which parallel programs may vary. Off the top of my head:
e.g. BigLinalg is 001, NNs are 111. What other dimensions exist, and can we come up with practical examples across all of them?
Any obvious holes in this plan?