The collective muting & unmuting behavior of DataChannel actors needs to be reduced. The effective startup time of Wallaroo can be dramatically affected by small changes in pipeline structure, such as where a sub-pipeline is .merge()d into a main pipeline.
What is the expected behavior?
Less-than-exponential(-seeming) behavior as the # of Wallaroo workers grows.
Test output such as https://gist.github.com/slfritchie/5d6cbcb243d8197f61c659445feb9454 shows a set of results when starting a 2 or 4 or 8 worker cluster whose application pipeline includes 3 sources and two .merge() operations on sub-pipelines. As soon as worker0 polls ready, then a single operation is sent to the first source, and the sink's output is shown with a timestamp.
Relevant times to examine (and consider that each test was run only once):
Time from first worker logging |~~ INIT PHASE IV: Cluster is ready to work! ~~| to last worker's log.
Time for last worker's ready log to the processing time of the 1 work item processed by Wallaroo.
If the source is modified as suggested in the README.md file, moving the .merge(other)call from line 57 to line 62, i.e., immediately before the .to_sink() call, then the times drop significantly.
The timing with the (*) label is most anomalous: it's too big compared to the typical latency. But these simple results show that a small change in pipeline definition can change the time of first processed item end-to-end from roughly 10 seconds down to 1.3 seconds.
Is this a bug, feature request, or feedback?
Enhancement/performance improvement
What is the current behavior?
The collective muting & unmuting behavior of DataChannel actors needs to be reduced. The effective startup time of Wallaroo can be dramatically affected by small changes in pipeline structure, such as where a sub-pipeline is
.merge()
d into a main pipeline.What is the expected behavior?
Less-than-exponential(-seeming) behavior as the # of Wallaroo workers grows.
What OS and version of Wallaroo are you using?
Ubuntu Bionic/18.04 LTS + Wallaroo @ commit 35d2038
Steps to reproduce?
See README.md in tarball at http://wallaroolabs-dev.s3.amazonaws.com/scott/count2.tar.gz. Instructions include options for building & running a demonstration test via a VM or Docker.
Test output such as https://gist.github.com/slfritchie/5d6cbcb243d8197f61c659445feb9454 shows a set of results when starting a 2 or 4 or 8 worker cluster whose application pipeline includes 3 sources and two
.merge()
operations on sub-pipelines. As soon as worker0 polls ready, then a single operation is sent to the first source, and the sink's output is shown with a timestamp.Relevant times to examine (and consider that each test was run only once):
|~~ INIT PHASE IV: Cluster is ready to work! ~~|
to last worker's log.ready
log to the processing time of the 1 work item processed by Wallaroo.In the source as-is, those times are:
If the source is modified as suggested in the
README.md
file, moving the.merge(other)
call from line 57 to line 62, i.e., immediately before the.to_sink()
call, then the times drop significantly.The timing with the
(*)
label is most anomalous: it's too big compared to the typical latency. But these simple results show that a small change in pipeline definition can change the time of first processed item end-to-end from roughly 10 seconds down to 1.3 seconds.