Open qxf2 opened 6 years ago
@qxf2 which state partition example are you talking about?
@aturley the alphabet_partitioned example.
@qxf2
"there is no way of verifying that state partitioning helped."
helped with what?
@SeanTAllen help with concurrency. The concept of partitioning is introduced over here as:
If all of the application state exists in one state object then only one state computation at a time can access that state object. In order to leverage concurrency, that state needs to be divided into multiple distinct state objects. Wallaroo can then automatically distribute these objects in a way that allows them to be accessed by state computations in parallel.
So I felt that the example should illustrate the above point.
@qxf2 How does it not illustrate that point?
@SeanTAllen - maybe I misunderstand, but how can the example illustrate gain from concurrency when there is only one worker, one pipeline with one computation? Shouldn't there be at least 2 workers to illustrate that multiple computations can access different parts of state concurrently?
Yes, I think we need clarification on what kind of benefits one can gain on which platform. When running a python Wallaroo app, partitioning state but still running one machida
worker will not improve things.
.. by 'things' above, I mean: parallelized execution and results arrived at earlier.
The example is how to write to the API to accomplish that. Running on multiple workers is a detail. I think we could clarify there but I don't see how having someone go through the extra work of running it on multiple workers would be valuable at this point in their journey. I think we need to be demonstrating that elsewhere.
@SeanTAllen I hadn't thought about the resulting tutorial complexity and agree that 2-worker setup would take away from the main point of the tutorial. Maybe just a line in the tutorial calling out the fact that parallel execution is demonstrated elsewhere could be sufficient.
After the discussion, I am ok if you close this ticket with no change.
@pzel I think a change from " Wallaroo can then automatically distribute these objects in a way that allows them to be accessed by state computations in parallel" to something that is a touch more clear that it's "possible" but not guaranteed might work.
Or we could change to:
"Wallaroo can then automatically distribute these objects across a cluster of Wallaroo workers in a way that allows them to be accessed by state computations in parallel" or something like that.
Thoughts?
State partition example should show a two-worker setup. Otherwise, the example is not very illustrative.
Is this a bug, feature request, or feedback?
Feedback
What is the current behavior?
Currently, the example shows a 1-worker setup and there is no way of verifying that state partitioning helped. In theory, yeah - I can nod my head and understand why having small state object helps, but it would be good to illustrate the point in the example.
What is the expected behavior?
Prove that state partitioning helps when multiple workers are editing the same state space. To work things out by myself, I ended up using: a) time.sleep() in the computation, b) limiting the messages sent c) 2-worker setup.
What OS and version of Wallaroo are you using? If you have a stacktrace and/or steps to reproduce the issue, that also helps a lot.
Version = 0.5.0 . OS is not applicable to this piece of feedback.