IBMStreams / administration

Umbrella project for the IBMStreams organization. This project will be used for the management of the individual projects within the IBMStreams organization.
Other
19 stars 10 forks source link

Proposal: streamsx.cassandra #99

Closed ecurtin closed 7 years ago

ecurtin commented 7 years ago

Proposal: streamsx.cassandra

streamsx.cassandra is a toolkit in active development (and production!) at The Weather Company. It consists of an operator that writes Streams tuples to Cassandra.

The operator is a very thin Java facade for a Scala implementation. It's built using SBT.

Basic Capabilities

The operator is configured by specifying connection information in a ZooKeeper node. Additionally, it provides mechanisms, also configurable in ZK, for writing values as NULLS.

Nearly all SPL types are supported, including sets, lists, and maps.

Null Value Mechanism

For a real-life example, say we have a tuple representing a report from a weather station:

stream<rstring stationID, uint64 timestamp, uint32 tempF, uint32 windspeedMPH, ......>

Old-school meteorological convention specifies that invalid observations are reported as -9999. When these observations are written in Cassandra, however, we don't want to keep them as -9999, we want to take advantage of Cassandra's ability to write nulls. So if I specify in the JSON blob that I store in ZooKeeper:

{
  "tempF" : -9999
}

Any tuples that pass into the operator with the tempF value of -9999 will be written with a tempF value of NULL in Cassandra.

Licensing

The source is licensed under Apache V2.

Currently Supported Versions

namespace com.weather.test;

composite CassandraTest {

    graph

        stream<rstring greeting, uint64 count, list<int32> testList, set<int32> testSet, map<int32, boolean> testMap, int32 nInt> Greeting = Beacon() {
            param
                iterations: 1000000u; //generate 1000000 tuples
                period : 0.5; //generate a tuple every 0.5 seconds
            output
                Greeting:
                    greeting =  "Hello Streams!",
                    count = IterationCount() + 1ul,
                    testList = [1,2,3],
                    testSet = {4, 5, 6},
                    testMap = {7: true, 8 : false, 9: true},
                    nInt = -2147483647;
        }

        () as CoolStuff = com.weather.streamsx.cassandra::CassandraSink(Greeting) {
            param
                connectionConfigZNode: "/cassandra_config";
                nullMapZnode: "/null_values";
        }
}

The znodes specify the connection to a dev Cassandra cluster and specifies that the null value for "nint" is -2147483647.

And here's a sample of the output, which I am pulling using CQL using a call that you should never ever use on a real table :)

cqlsh:testkeyspace> select * from testtable;

 count | greeting       | nint | testlist  | testmap                      | testset
-------+----------------+------+-----------+------------------------------+-----------
    19 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
     2 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    24 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
     3 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    35 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    30 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    16 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    ... etc etc

Future Work

mikespicer commented 7 years ago

+1 FYI, for the configuration you currently store in Zookeeper we added secure application config in V4.2 that could be used for configuration. See details here http://www.ibm.com/support/knowledgecenter/SSCRJU_4.2.0/com.ibm.streams.dev.doc/doc/creating-secure-app-configs-dev.html

petenicholls commented 7 years ago

+1

chanskw commented 7 years ago

+1! very nice proposal btw!

leongor commented 7 years ago

+1

2016-10-07 0:04 GMT+03:00 Samantha Chan notifications@github.com:

+1! very nice proposal btw!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/IBMStreams/administration/issues/99#issuecomment-252088072, or mute the thread https://github.com/notifications/unsubscribe-auth/AGvlA2CSLe0VJaqiVc0FQUgWkWdSW0tJks5qxWJLgaJpZM4KQRUL .

Best regards, Leonid Gorelik.

petenicholls commented 7 years ago

Repository creation underway.

ecurtin commented 7 years ago

Sweet deal, I see the repo that got created! Two questions

  1. Am I good to go ahead and start pushing code?
  2. I've been doing releases for internal teams at TWC, so I'm currently on version 1.3.0. Can I keep that versioning here or do I need to start over? Either is fine, just don't want to step on toes :)
chanskw commented 7 years ago

@ecurtin before you can push code, can you please sign this document: https://github.com/IBMStreams/administration/blob/master/IBMStreams-cla-individual.pdf

Yes, I think you can keep the current version number.

chanskw commented 7 years ago

Repository created and set up. @ecurtin and @cin are initial committers.

ecurtin commented 7 years ago

Woo!! Thanks all!! Here it is! https://github.com/IBMStreams/streamsx.cassandra