CiscoCloud / edge-test

Test implementation for containerization of the edge components and their validation results.
Apache License 2.0
3 stars 4 forks source link

create a go kafka consumer to save off kafka data to cassandra #6

Open joestein opened 9 years ago

joestein commented 9 years ago

We want todo this in a table with DataStaxEnterprise running so we can have it solr indexed, this is VERY important for us to search the logline.line and such. It also allows us to index the data in the different ways you can categorize the object (e.g. based on source and tag and logtype, etc).

joestein commented 9 years ago

We should use https://github.com/gocql/gocql

edgefox commented 9 years ago

@joestein What structure should we expect from kafka and what should be exported to Cassandra?

joestein commented 9 years ago

We are getting Avro LogLine from Kafka. This could be materialized as a few different Cassandra tables. You can save the timeuuid using the value from the LogLine which is nice, clean and unique http://docs.datastax.com/en/cql/3.1/cql/cql_reference/timeuuid_functions_r.html and save that instead of using the now(). We can keep the time ordered more atomically from when the event happened vs from when the system though it happened. We should have another table for storing now() as a cluster key for similar partitions key. A spark job with kafka and cassandra can provide continued monitoring, audits and alerts for when these are drifting and what the current drift is for each point "touching" the event and what time the system actually thinks it is. This value should accompany the ntp drift value so we can also compare the time the server thinks it is too.

Other tables should be made up from tag structures. Every tag should have a value as a partition with the cluster key being the time of the event. We should also do source and every combination of source too. The actually partition index breaking up should come from the log type index value. Sometimes different tag keys together will make up a primary key along with some of the values matched.