JuliaParallel / Dagger.jl

A framework for out-of-core and parallel execution
Other
637 stars 67 forks source link

Design Discussion: Parallel IR #36

Open MikeInnes opened 8 years ago

MikeInnes commented 8 years ago

Me and @shashi have discussed how we might use a graph-program-format as a kind of parallel IR.

The idea is to have an hourglass design; the DAG is a small core abstraction, above it you build parallel programs, libraries, algorithms, below it you build custom optimisation passes, integrate with other backends and hardware and so on. If things are loosely coupled enough then the problem statement and its implementation become orthogonal; everything is very plug-and-play, and you start building a parallel computing ecosystem rather than just a bunch of apps on top of a single framework.

From 10,000 feet, the building blocks are:

  1. Graph data structures and functions on them – they need to be as natural to work with as trees.
  2. A design for how to specify programs as graphs. (What protocol do operations in the graph satisfy, if any? Are they black boxes? Do they provide a cost model, hardware affinity? etc.)
  3. Splitting packages like ComputeFramework into modular pieces which communicate via the above.
  4. ???
  5. Profit

Flow.jl is a start on (1). Shashi has had (3) in mind while designing CF. (2) and (4) involve a lot of tough design decisions that we haven't figured out yet. For example, how do we get optimisation passes or schedulers to cooperate without explicitly knowing about each other? (5) will take a while.

Of course, it's absolutely key that the DAG is expressive enough to represent a useful subset of parallel programs. Dynamic sizing is enough to express recursive / irregular problems. What about other abstractions, like distributed channels – how do they fit in? Do they need to?

To reason about this I think it may be useful to evaluate the dimensions along which parallel programs may vary. Off the top of my head:

e.g. BigLinalg is 001, NNs are 111. What other dimensions exist, and can we come up with practical examples across all of them?

Any obvious holes in this plan?

ViralBShah commented 8 years ago

I like the plan. Would it make sense to do a quick prototype of all this to see how it works with the ALS and logistic regression - which are the largest problems we have working with CFW at the moment? They tick off all the boxes, and are small enough to meaningfully reason about.

Also, they make for great demos.

MikeInnes commented 8 years ago

I think that kind of demo would be great for the API work but I'm not sure it makes as much sense for the ideas discussed here.

In order to reason about parallel composition, we need test problems which (a) are composed of multiple smaller parallel problems/algorithms and/or (b) will benefit from very different runtimes/optimisations cooperating together.

Given that this is a reasonably ambitious plan (ok, understatement...) I'm thinking the best short-term approach is to keep these ideas the back of our minds while designing CF and similar libraries, then beginning to coax out general solutions from them. (I think this is where @shashi is at the moment as well, although he may well have a better idea.)

shashi commented 7 years ago

The more I think about it the more I think a great "IR" for array code is going to be some notation like in TensorOperations.jl (tracked in #45) it allows parallelism, talks at a higher level than different hardware (e.g. CPU, GPU) and software libs (e.g. can still call BLAS by dispatching on a subexpression).

yuehhua commented 7 years ago

I like the idea. About the DAG workflow, I think it fits the concept of concurrency, rather than parallelism. I was inspired by a Golang talk Rob Pike - 'Concurrency Is Not Parallelism'.