IBMStreams / streamsx.topology

Develop streaming applications for IBM Streams in Python, Java & Scala.
http://ibmstreams.github.io/streamsx.topology
Apache License 2.0
29 stars 43 forks source link

"SetXXXX" paradigm for streams. #462

Closed ddebrunner closed 5 years ago

ddebrunner commented 7 years ago

Most operations applied to a stream affect downstream processing , e.g. filter, parallel etc.

For various actions such as source UDP and consistent regions we need to apply an operation on a stream that affects the operator that produces the stream. Proposal is to use setXXXX, such as (psuedo code):

// Set this stream to be the start of a consistent region stream.setConsistent(time=30)

// Set this stream to be parallel with width of 3 stream.setParallel(width=3)

chanskw commented 7 years ago

I like the idea of supporting consistent region and UDP in Java.

In this paradigm, how would one tell where the consistent region start or finish. If I set a stream to be consistent, would the streams created by downstream operations also be consistent?

How would one indicate when a consistent region end?

An idea is to create the concept of "ConsistentStream" / "UDPStream" like this:

ConsistentStream ccStream = streams.setConsistent(time=30) ConsistentStream ccStream2 = ccStream.transform(...);

People can query properties about the consistent or UDP stream if we have special objects dedicated to these concepts.

To end consistent region: TStream normalStream = ccStream.setAutonomous(...);

ddebrunner commented 7 years ago

Right, the consistent region paradigm established by SPL would be followed, anything that is reachable from the source is consistent.

Setting the stream to be autonomous would terminate the region.

ddebrunner commented 7 years ago

Pull #506 started.

ddebrunner commented 7 years ago

@chanskw Have specific stream types quickly becomes a nightmare, I looked at that for supporting the concept of a keyed stream (e.g. KeyedTStream) and it becomes ugly quickly. Eg., in your example what if the stream is in a UDP region and a consistent region, is that another interface? SPL stream and UDP and consistent etc. etc.

I think pretty much all of the time an application (or portion of an application) should not care if it's in consistent, autonomous or UDP regions.

chanskw commented 7 years ago

Pinging to see where we are with this support. In our discussions with streamsx.health clients, HA and application HA is important in that domain. The healthcare analytics platform uses the Java topology to define microservices. We will also go into using Python in the future. So, as more of our clients use the health toolkit in healthcare setting, consistent region support will become more important.

In thinking about this some more, another issue to think about is how we can persist and restore state in the custom Java code that the topology call. For example, we can define a custom class that maintain state, and use these classes as part of the topology. How do we allow these classes to participate in a checkpoint and reset in this scenario?

ddebrunner commented 7 years ago

another issue to think about is how we can persist and restore state in the custom Java code that the topology call

Already supported in Java.

http://ibmstreams.github.io/streamsx.topology/doc/javadoc/com/ibm/streamsx/topology/Topology.html#checkpointPeriod-long-java.util.concurrent.TimeUnit-

I believe one can use consistent regions by starting the region with an SPL operator that is annotated with @consistent region. At this time only SPL (Java or C++) primitive operators can be source (restartable) operators.

chanskw commented 7 years ago

Ok... I think you mean that the topology can wrapper SPL composite, where its source operator is annotated as the start of a consistent region. Is this correct?

In the health toolkit, we have written "ingest services" in Java that will ingest data from external sources. So this approach will not work for us. But I think I understand what you are saying.

ddebrunner commented 7 years ago

We can take a look to see what supporting a consistent region would mean for a Java source function.