For a batch analytic job, an external trigger that launches the job
For "pseudo streaming" analytic jobs (ie when its persistance is "streaming" but it's operating on batch data - Spark will look like this often)
I'd rather not use Akka because I still don't trust their clustering (thought it's much improved from 2.5+), Kafka is certainly an option, and/or ZK
So we'd add a "signalling" message to the context APIs, which would check security (you'd need write access)
In the first case you'd want a centralized queue that Akka/DIM sat on. In the second case you'd want the job itself to have its own queue.
Kafka is a bit of a blunt instrument, but it does have the advantage that there are lots of open source clients to send messages across it (eg Logstash)
(Not directly related but thinking out loud on the subject of triggers, if someone specifies a file trigger then perhaps instead of being polled directly, that could get added to some some file watcher actor)
Two different use cases:
I'd rather not use Akka because I still don't trust their clustering (thought it's much improved from 2.5+), Kafka is certainly an option, and/or ZK
So we'd add a "signalling" message to the context APIs, which would check security (you'd need write access)
In the first case you'd want a centralized queue that Akka/DIM sat on. In the second case you'd want the job itself to have its own queue.
Kafka is a bit of a blunt instrument, but it does have the advantage that there are lots of open source clients to send messages across it (eg Logstash)