amazon-archives / dynamodb-janusgraph-storage-backend

The Amazon DynamoDB Storage Backend for JanusGraph
Apache License 2.0
446 stars 99 forks source link

Configuring SparkGraphComputer for OLAP #269

Open sandeepdoctily opened 6 years ago

sandeepdoctily commented 6 years ago

Hi All

I am trying to configure sparkGraphComputer with DYnamodb local. Please find below the configuration. Kindly help me out.

TinkerPop Hadoop Graph for OLAP

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

Set the default OLAP computer for graph.traversal().withComputer()

gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

gremlin.hadoop.graphInputFormat=org.apache.hadoop.dynamodb.read.DynamoDBInputFormat

gremlin.hadoop.graphOutputFormat=org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat

####################################

SparkGraphComputer Configuration

####################################

spark.master=local[*]

spark.executor.memory=200m

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.akka.timeout=500000

spark.kryo.registrationRequired=false

spark.storage.memoryFraction=0.2

spark.eventLog.enabled=true

spark.eventLog.dir=/tmp/spark-event-logs

spark.ui.killEnabled=true

spark.dynamicAllocation.enabled=false

spark.network.timeout=60000

spark.rpc.askTimeout=80000

spark.sql.broadcastTimeout=90000

spark.serializer=org.apache.spark.serializer.KryoSerializer

janusgraphmr.ioformat.conf.storage.backend==com.amazon.janusgraph.diskstorage.dynamodb.DynamoDBStoreManager

janusgraphmr.ioformat.conf.storage.dynamodb.client.credentials.class-name=com.amazonaws.auth.BasicAWSCredentials

janusgraphmr.ioformat.conf.storage.dynamodb.client.credentials.constructor-args=access,secret

janusgraphmr.ioformat.conf.storage.dynamodb.client.signing-region=us-east-1

janusgraphmr.ioformat.conf.storage.dynamodb.client.endpoint=http://localhost:8000

gremlin.graph=org.janusgraph.core.JanusGraphFactory

metrics.enabled=true

metrics.prefix=j

metrics.csv.interval=1000

metrics.csv.directory=metrics

storage.write-time=1 ms

storage.read-time=1 ms

storage.backend=com.amazon.janusgraph.diskstorage.dynamodb.DynamoDBStoreManager

storage.dynamodb.client.credentials.class-name=com.amazonaws.auth.BasicAWSCredentials

storage.dynamodb.client.credentials.constructor-args=access,secret

storage.dynamodb.client.signing-region=us-east-1

storage.dynamodb.client.endpoint=http://localhost:8000

When I run a query I get the below expection: gremlin> g.V().count()

java.lang.RuntimeException: class org.apache.hadoop.dynamodb.read.DynamoDBInputFormat not org.apache.hadoop.mapreduce.InputFormat at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2221) at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$0(SparkGraphComputer.java:177)

amcp commented 6 years ago

DynamoDBInputFormat is not implemented yet, but could be implemented by copying from or depending on the DynamoDB EMR connector. https://github.com/awslabs/emr-dynamodb-connector/blob/master/emr-dynamodb-hadoop/src/main/java/org/apache/hadoop/dynamodb/read/DynamoDBInputFormat.java

danielwhatmuff commented 5 years ago

Is DynamoDBInputFormat now implemented?