caroljmcdonald / mapr-streams-sparkstreaming-hbase

24 stars 23 forks source link

Commands to run :

Step 1: First compile the project on eclipse: Select project -> Run As -> Maven Install

Step 2: use scp to copy the ms-sparkstreaming-1.0.jar to the mapr sandbox or cluster

also use scp to copy the data sensor.csv file from the data folder to the cluster put this file in a folder called data. The producer reads from this file to send messages.

scp ms-sparkstreaming-1.0.jar user01@ipaddress:/user/user01/. if you are using virtualbox: scp -P 2222 ms-sparkstreaming-1.0.jar user01@127.0.0.1:/user/user01/.

Create the topic

maprcli stream create -path /user/user01/pump -produceperm p -consumeperm p -topicperm p maprcli stream topic create -path /user/user01/pump -topic sensor -partitions 3

get info on the topic maprcli stream info -path /user/user01/pump

To run the MapR Streams Java producer and consumer:

java -cp ms-sparkstreaming-1.0.jar:mapr classpath solution.MyProducer

java -cp ms-sparkstreaming-1.0.jar:mapr classpath solution.MyConsumer

Step 3: To run the Spark streaming consumer:

start the streaming app

spark-submit --class solution.SensorStreamConsumer --master local[2] ms-sparkstreaming-1.0.jar

ctl-c to stop

step 4: Spark streaming app writing to hbase

first make sure that you have the correct version of HBase installed, it should be 1.1.1:

cat /opt/mapr/hbase/hbaseversion 1.1.1

Next make sure the Spark HBase compatibility version is correctly configured here: cat /opt/mapr/spark/spark-1.5.2/mapr-util/compatibility.version hbase_versions=1.1.1

If this is not 1.1.1 fix it.

Create an hbase table to write to: launch the hbase shell $hbase shell

create '/user/user01/sensor', {NAME=>'data'}, {NAME=>'alert'}, {NAME=>'stats'}

Step 4: start the streaming app writing to HBase

spark-submit --class solution.HBaseSensorStream --master local[2] ms-sparkstreaming-1.0.jar

ctl-c to stop

Scan HBase to see results:

$hbase shell

scan '/user/user01/sensor' , {'LIMIT' => 5}

scan '/user/user01/sensor' , {'COLUMNS'=>'alert', 'LIMIT' => 50}

Step 5: exit, run spark (not streaming) app to read from hbase and write daily stats

spark-submit --class solution.HBaseReadRowWriteStats --master local[2] ms-sparkstreaming-1.0.jar

$hbase shell

scan '/user/user01/sensor' , {'COLUMNS'=>'stats', 'LIMIT' => 50}

cleanup if you want to re-run:

maprcli stream topic delete -path /user/user01/pump -topic sensor maprcli stream topic create -path /user/user01/pump -topic sensor -partitions 3 hbase shell truncate '/user/user01/sensor'