Assess Architectural Issues that could Prevent Cluster Deployment

colinsheppard commented 7 years ago

As we do this, let's consider whether / how we might start using Akka streams for certain types of communications. E.g. instead of initializing transit actors in the router worker, we could do the initialization in BeamSim, which could request a stream of transit trips from the router which it then transforms into the actors.

I'm also curious about the pros/cons of using akka streams for event handling instead of a pub/sub model. I see that our current event handling approach will need to be replaced with a distributed pub/sub system... but what I wonder about it whether the backpressure features of akka streams would be useful for ensuring that slow event handlers never become overrun by the event stream.

Or maybe we have both, i.e. we use the distributed pub/sub but make one of our subscribers be a Source stream that can be used for certain pipelines where backpressure is an issue (e.g. for live vizualization of the results).

https://doc.akka.io/docs/akka/current/scala/stream/index.html

https://doc.akka.io/docs/akka/snapshot/java/distributed-pub-sub.html

dserdyuk commented 7 years ago

Do we have any metrics (Kamon or other) about avg msg waiting time and actor's queue size ? I think we need to take into account those before doing any serious design changes

sfwatergit commented 7 years ago

Agreed with denys. I'll make the kamon integration and metrics viz top priority (it's mostly done anyway. I've set up a few contexts in the perfmon that should capture routing metrics adequately. Going forward, I'll need to do some refinement based on any additional feedback from others on what we want measured, visualized, and how to best integrate the viz suite in production).

Re: streams... I've been thinking about this, and I'm essentially on board, given that we may want to split the output from the events to different sinks (e.g., physsim, file, and beam viz as separate end points). Akka streams, in fact, seem to be well regarded all around from what I can tell in the reactive community. The back pressure mgmt seems useful as well.

Otherwise, I think we will need to better integrate the glokka utility into beam services... Maybe through a facade class. Glokka is designed to work with clusters, so it shouldn't be a problem. Also, we will need to consider resiliency of nodes and what it might entail for a node to go offline while processing its message queue.

The other thought I had there is to containerize viz, physsim, and beamviz as microservices and use Kafka+Camel. There is some benefit there, as I believe we could generalize all three of these as essentially separate web services with well defined apis. That might be something to think about well down the line, though. The only reason I'm mentioning it, is that it would be a way to steer other orgs to reuse components for their own purposes (with the added benefit of further open source synergy and citations once we put together the architecture paper). Beam viz, for example, is a really useful product for general vehicle trajectory visualization, and I could see it as a good way to generate extra PR, if advertised as such. Again, not a top priority, but something to think about in terms of how we evolve the architecture.

On Oct 7, 2017 7:39 AM, "Denis Serdyuk" notifications@github.com wrote:

Do we have any metrics (Kamon or other) about avg msg waiting time and actor's queue size ? I think we need to take into account those before doing any serious design changes

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/LBNL-UCB-STI/beam/issues/123#issuecomment-334939636, or mute the thread https://github.com/notifications/unsubscribe-auth/ABiGtY--xKyOSEQ3b8pmchVyj1tBDHSYks5sp407gaJpZM4PxDd4 .

LBNL-UCB-STI / beam

Assess Architectural Issues that could Prevent Cluster Deployment #123