Implement concurrency for pure computations/pure APIs

harendra-kumar commented 5 years ago

Currently we implement concurrency only for monadic computations. However, the same model can be extended to pure computations as well. For example, the map operation mapping a pure function to a stream can be made parallel just like mapM. We can make the cons operator parallel just like we have a parallel consM.

However, in case of pure computations there are several differences which might make it simpler as well as challenging. In case of pure computations we cannot block therefore we need a maximum of only as many threads as the number of CPUs. When the work unit is tiny we should not add overhead by doing it in multiple threads, each thread should be doing significant amount of work for the parallelism to be beneficial, the overhead of scheduling should not be excessive. We already do this even in monadic concurrency, the same can perhaps be extended to pure concurrency as well.

We can benchmark the implementation against the existing pure parallelism mechanisms e.g. par, parallel and monad-par packages.

harendra-kumar commented 5 years ago

There are some benchmarks mentioned in the monad-par paper we can try those. In fact we can put those in the concurrency-benchmarks package and compare streamly/monad-par/parallel etc through that package.

To measure scaling performance we used a selection of benchmarks
from two sources. The following benchmarks were taken from the
Intel CnC Haskell7
distribution, and converted into two versions,
one using Strategies and one using the Par monad:
• blackscholes: An implementation of the Black-Scholes algorithm
for modeling financial contracts. This program is a
straightforward use of parMap in both Strategies and Par versions.
• nbody: calculate the forces due to gravity between a collection
of bodies in 3-dimensional space.

The following benchmarks were obtained from the nofib
benchmark suite and converted to use the Par monad:
• mandel: a mandelbrot set renderer.
• matmult: matrix multiplication (unoptimised; using a list-oflists
representation for the matrices).
• minimax: a program to find the best move in a game of 4×4
noughts-and-crosses, using alpha-beta searching of the game
tree to a depth of 6 moves. The game tree is evaluated in
parallel.
• queens: calculate the number of solutions to the N-queens
problem for 14 queens on a 14x14 board. Parallelism is via
divide-and-conquer.
• sumeuler: compute the sum of the value of Euler’s function
applied to each integer up to a given bound.

harendra-kumar commented 5 years ago

For pure parallelism and for stateless monads (e.g. ReaderT) we should not need the monad-control (MonadBaseControl) constraints. Those should be needed only for monadic streaming with stateful (e.g. StateT) monads.

composewell / streamly

Implement concurrency for pure computations/pure APIs #129