Mute/unmute overhead needs reduction

Is this a bug, feature request, or feedback?

Enhancement/performance improvement

What is the current behavior?

The collective muting & unmuting behavior of DataChannel actors needs to be reduced. The effective startup time of Wallaroo can be dramatically affected by small changes in pipeline structure, such as where a sub-pipeline is .merge()d into a main pipeline.

What is the expected behavior?

Less-than-exponential(-seeming) behavior as the # of Wallaroo workers grows.

What OS and version of Wallaroo are you using?

Ubuntu Bionic/18.04 LTS + Wallaroo @ commit 35d2038

Steps to reproduce?

See README.md in tarball at http://wallaroolabs-dev.s3.amazonaws.com/scott/count2.tar.gz. Instructions include options for building & running a demonstration test via a VM or Docker.

Test output such as https://gist.github.com/slfritchie/5d6cbcb243d8197f61c659445feb9454 shows a set of results when starting a 2 or 4 or 8 worker cluster whose application pipeline includes 3 sources and two .merge() operations on sub-pipelines. As soon as worker0 polls ready, then a single operation is sent to the first source, and the sink's output is shown with a timestamp.

Relevant times to examine (and consider that each test was run only once):

Time from first worker logging |~~ INIT PHASE IV: Cluster is ready to work! ~~| to last worker's log.
Time for last worker's ready log to the processing time of the 1 work item processed by Wallaroo.

In the source as-is, those times are:

2 worker: 0.020 sec, 4 worker: 1.105 sec, 8 worker: 2.354 sec
2 worker: 0.036 sec, 4 worker: 3.307 sec, 8 worker: 7.699 sec

If the source is modified as suggested in the README.md file, moving the .merge(other)call from line 57 to line 62, i.e., immediately before the .to_sink() call, then the times drop significantly.

2 worker: 0.001 sec, 4 worker: 0.050 sec, 8 worker: 0.012 sec
2 worker: 1.031 sec(*), 4 worker: 1.118 sec, 8 worker: 1.244 sec

The timing with the (*) label is most anomalous: it's too big compared to the typical latency. But these simple results show that a small change in pipeline definition can change the time of first processed item end-to-end from roughly 10 seconds down to 1.3 seconds.

WallarooLabs / wally