State partition example should show a two-worker setup

WallarooLabs / wally

Distributed Stream Processing

https://www.wallaroolabs.com

Apache License 2.0

1.48k stars 68 forks source link

State partition example should show a two-worker setup #2381

Open qxf2 opened 6 years ago

qxf2 commented 6 years ago

State partition example should show a two-worker setup. Otherwise, the example is not very illustrative.

Is this a bug, feature request, or feedback?

Feedback

What is the current behavior?

Currently, the example shows a 1-worker setup and there is no way of verifying that state partitioning helped. In theory, yeah - I can nod my head and understand why having small state object helps, but it would be good to illustrate the point in the example.

What is the expected behavior?

Prove that state partitioning helps when multiple workers are editing the same state space. To work things out by myself, I ended up using: a) time.sleep() in the computation, b) limiting the messages sent c) 2-worker setup.

What OS and version of Wallaroo are you using? If you have a stacktrace and/or steps to reproduce the issue, that also helps a lot.

Version = 0.5.0 . OS is not applicable to this piece of feedback.

aturley commented 6 years ago

@qxf2 which state partition example are you talking about?

qxf2 commented 6 years ago

@aturley the alphabet_partitioned example.

SeanTAllen commented 6 years ago

@qxf2

"there is no way of verifying that state partitioning helped."

helped with what?

qxf2 commented 6 years ago

@SeanTAllen help with concurrency. The concept of partitioning is introduced over here as:

If all of the application state exists in one state object then only one state computation at a time can access that state object. In order to leverage concurrency, that state needs to be divided into multiple distinct state objects. Wallaroo can then automatically distribute these objects in a way that allows them to be accessed by state computations in parallel.

So I felt that the example should illustrate the above point.

SeanTAllen commented 6 years ago

@qxf2 How does it not illustrate that point?

qxf2 commented 6 years ago

@SeanTAllen - maybe I misunderstand, but how can the example illustrate gain from concurrency when there is only one worker, one pipeline with one computation? Shouldn't there be at least 2 workers to illustrate that multiple computations can access different parts of state concurrently?

pzel commented 6 years ago

Yes, I think we need clarification on what kind of benefits one can gain on which platform. When running a python Wallaroo app, partitioning state but still running one machida worker will not improve things. .. by 'things' above, I mean: parallelized execution and results arrived at earlier.

SeanTAllen commented 6 years ago

The example is how to write to the API to accomplish that. Running on multiple workers is a detail. I think we could clarify there but I don't see how having someone go through the extra work of running it on multiple workers would be valuable at this point in their journey. I think we need to be demonstrating that elsewhere.

qxf2 commented 6 years ago

@SeanTAllen I hadn't thought about the resulting tutorial complexity and agree that 2-worker setup would take away from the main point of the tutorial. Maybe just a line in the tutorial calling out the fact that parallel execution is demonstrated elsewhere could be sufficient.

After the discussion, I am ok if you close this ticket with no change.

SeanTAllen commented 6 years ago

@pzel I think a change from " Wallaroo can then automatically distribute these objects in a way that allows them to be accessed by state computations in parallel" to something that is a touch more clear that it's "possible" but not guaranteed might work.

Or we could change to:

"Wallaroo can then automatically distribute these objects across a cluster of Wallaroo workers in a way that allows them to be accessed by state computations in parallel" or something like that.

Thoughts?