IBMStreams / administration

Umbrella project for the IBMStreams organization. This project will be used for the management of the individual projects within the IBMStreams organization.
Other
19 stars 10 forks source link

Proposal for a new project: streamsx.topology #62

Closed ddebrunner closed 9 years ago

ddebrunner commented 9 years ago

A project that supports building streaming topologies (applications) for InfoSphere Streams in different programming languages, such as Java.

We ( @wmarshall484 and myself) have initial code for writing streaming topologies in Java using a functional style, with Java objects as tuples, integration with SPL and Streams features, such as User Defined Parallelism, Import/Export etc. Thus a Java developer can develop a streaming application using just Java (and no SPL knowledge required) that runs on InfoSphere Streams in distributed or standalone. A Java developer can also choose to utilize existing SPL operators etc., thus their application can include streams with SPL schemas as well as streams with Java objects as tuples. The initial code will include tests and sample applications.

The code works against Streams 4.0, and users would install it as a SPL toolkit.

Here's a simple example of a application that monitors a directory for new files and outputs any lines in the files that match a regular expression. The regular expression matching is an example of the functional approach, where an anonymous class acts as a function to filter lines, basically each tuple (as a String) on the stream will result into a call to the 'test' method, and if it returns true then the tuple will appear on the output stream, otherwise the input tuple is discarded.

  public static void main(String[] args) throws Exception {
    String contextType = args[0];
    String directory = args[1];
    final Pattern pattern = Pattern.compile(args[2]);

    // Define the topology
    Topology topology = new Topology("RegexGrep");

    // All streams with tuples that are Java String objects
    TStream<String> files = directoryWatcher(topology, directory);
    TStream<String> lines = textFileReader(files);
    TStream<String> filtered = lines.filter(new Predicate<String>() {

        @Override
        public boolean test(String v1) {
           // Pass the line through if it matches the
           // regular expression pattern
           return matcher.reset(v1).matches();
        }

        // Recreate the matcher (which is not serializable)
        // when the object is deserialized.
        transient Matcher matcher;         
        private Object readResolve() throws ObjectStreamException {
            matcher = pattern.matcher("");
            return this;
        }
    });

    // For debugging just print out the tuples
    filtered.print();

    // Execute the topology
    Future<?> future = StreamsContextFactory.getStreamsContext(contextType).submit(topology);
    Thread.sleep(10000);
    future.cancel(true);      
}
hildrum commented 9 years ago

+1

mikespicer commented 9 years ago

+1

chanskw commented 9 years ago

+1 Propose to keep voting open until tomorrow May 27. Will create repository before the end of the week.

Please let me know who the initial committers should be. Who is @wmarshall484? and has he / she signed this document? https://github.com/IBMStreams/administration/blob/master/IBMStreams-cla-individual.pdf

leongor commented 9 years ago

Looks very interesting, a couple of questions:

  1. Can you elaborate a little bit about integration with SPL?
  2. Does this topology run directly on JVM or there is some kind of JAVA to SPL conversion?
  3. Where these 'directoryWatcher' and 'textFileReader' come from? Is there some kind of Standard Toolkit for Java?
  4. How then custom operators/functions can be added?
ddebrunner commented 9 years ago

@leongor

A1: As well as a Java topology having streams of Java objects, the api supports streams of SPL tuples, using the Java Operator API Tuple interface to represent a tuple in Java. These SPL streams are usually a result of including SPL operators (primitives, composite or Custom) in the topology. For example a Java developer could include a SPL composite developed by an SPL developer in their topology. Dynamic connections are also supported, meaning an SPL application can export/import to/from a Java topology application.

A2: Some topologies can run embedded within the JVM that declares them, but to run in standalone or distributed against InfoSphere Streams then we create a SPL application as a new 4.0 Streams bundle that implements the topology, using Java primitive operators to execute any Java logic.

A3: They are methods provided by the api, so it's similar to a toolkit. Any new functionality can be provided by a static method in any class that creates the required topology elements, so it's easy to externally enhance the api.

A4: The example above has an example of a custom filtering of the stream, using an anonymous class as a function. Once Java 8 is supported then lambda functions would be supported. See also A3. for how a common transformation or sub-topology can be provided.

ddebrunner commented 9 years ago

@chanskw

Initial committers would be @ddebrunner and @wmarshall484 (William Marshall) once he has signed the CLA.

cancilla commented 9 years ago

+1

leongor commented 9 years ago

+1

petenicholls commented 9 years ago

+1....repository will be created.

chanskw commented 9 years ago

http://ibmstreams.github.io/streamsx.topology