amazon-archives / dynamodb-janusgraph-storage-backend

The Amazon DynamoDB Storage Backend for JanusGraph
Apache License 2.0
446 stars 99 forks source link

Running aws lambda with TitanDb and DynamoDB as a backend #41

Closed SundeepK closed 7 years ago

SundeepK commented 8 years ago

This is more of a question rather than an issue, but I just wanted to know whether it's a requriement to run a gremlin server inorder to use DynamoDB as a backend. It would seem that after playing around with things that you only need to configure your TitanGraph instance to reference an endpoint and backend to DynamoDB. This suggests that I can directly update and query my graph backend through each lambda invocation by simply instantiating a graph instance.

My thoughts was that I could create my backend graph structure once and then let my lambdas populate and query the graph database. Is that reasonable use case for TitanDB and DynamoDB as a backend? Running a fleet of gremlin servers to handle querying and updating the graph database seems like a lot of overhead.

lionelport commented 8 years ago

You're right this is not an issue and should of been raised in a forum or as stackoverflow question.

The simple answer is Gremlin server is not required. If your programming language is Java (or a JVM language) you can use Titan as a library and point it directly at the storage backend.

SundeepK commented 8 years ago

Thank you for the reply, will raise questions on stack overflow instead.

GreyEcologist commented 8 years ago

@SundeepK Hey Sundeep, were you able to set up and update TitanDB via Lambda?

SundeepK commented 8 years ago

@GreyEcologist I haven't got around to doing that yet so I don't know how well it performs. I'm still evaluating whether Titan is a good choice for our use case. But I don't see any immediate problems with using AWS Lambda with TitanDB. Will update you if I set this up.

gkrizek commented 8 years ago

@SundeepK @GreyEcologist, Have either of you played with TitanDB in Lambda yet? I'm interested in trying it out, but theres little to no information on setting it up or how well it runs.

kurtmaile commented 8 years ago

Hi all any further updates on this? I too would like to see an example of APIGateway -> Lambda -> TitanDB according to best practices. Would be a great blogpost.

Actually wish AWS offered this as as service - properly integrated into DynamoDB so no servers to manage / scale. How cool would that be

ViswanathLekshmanan commented 7 years ago

Any update on this ?

amcp commented 7 years ago

Currently, serving requests out of Gremlin Server + JanusGraph in would require a few changes to the JanusGraph core to make it start up faster. Also, would have to figure out a way of reusing a pool of UNIQUE_MACHINE_IDs so that you do not run out of ids in your id pool, as you get an id lease each time JanusGraph starts up. Perhaps a good first step would be to create this as an issue on the JanusGraph project. At the same time, the DynamoDB storage backend could also add configuration to make its bit start faster as well. For example, the ensureStore logic could be skipped if you set a configuration option.

ViswanathLekshmanan commented 7 years ago

Is there any option to run Titan Graph database over DynamoDb without Gremlin server. Trying to avoid using EC2 only using AWS lambda ?

amcp commented 7 years ago

You can use JanusGraph without Gremlin server. Just depend on dynamodb-janusgraph-storage-backend as a library in your Lambda package.

robertoandrade commented 7 years ago

I was able to deploy TitanDB (haven't tried JanusDB yet) as an AWS Lambda for my project and the only issue I have so far is cold boot start times when configuring it to use a built-in ES index. I'm in the process of changing the configuration to use a separate AWS ES cluster to see how well it performs and if still not satisfactory I'll try the tips offered here to bypass the Gremlin server. What I'm wondering is would you be able to query the graph using the gremlin-java API or would need to go lower level to issue queries against the graph?

robertoandrade commented 7 years ago

It looks like TitanDB wants to talk to ES using some sort of raw protocol on port 9200 and when I point it to AWS ES's HTTP/S ports it barks. I don't think AWS ES exposes the raw protocol to external consumers, is there a way to configure Titan/Janus to use the HTTP/S interface exposed by AWS ES?

robertoandrade commented 7 years ago

Ok, I got to the conclusion that the titan-es module used by Titan to communicate with ES only supports Transport and Node Clients which talk over raw TCP over ports 9200 and 9300. Since the "supported" REST Client doesn't conform to the same interface as the other 2 clients (ie: doesn't inherit from AbstractClient), it'd require quite a bit of changes to the Titan-Elasticsearch backend to use the REST client to talk to AWS ES exposed API on ports 80/443.

I'm gonna experiment forking titan-es and add in a 3rd client option: HTTP (instead of REST) using an alternative HTTP/REST Client I found that maintains compatibility with the Transport/Node clients and see what happens.

robertoandrade commented 7 years ago

Apparently the JanusGraph-ES module already supports this :\ so moving from Titan to JanusGraph became a higher priority for me now.

greghines commented 7 years ago

I want to run Janusgraph without the Gremlin server (hoping to speed up the Lambda function). @amcp - I understand that I can just depend on the dynamodb-janusgraph-storage-backend library once I have everything booted up, (and just use Java code) but it seems that the problem is starting up the Gremlin sever in the first place. @robertoandrade do you have any suggestions on how to bypass starting up the Gremlin server?