amazon-archives / dynamodb-janusgraph-storage-backend

The Amazon DynamoDB Storage Backend for JanusGraph
Apache License 2.0
447 stars 99 forks source link

JanusGraph is slower when retrieving vertex properties. #235

Open pasalkarsachin1 opened 7 years ago

pasalkarsachin1 commented 7 years ago

Hi,

We have a graph with ~5-6K vertices, each vertex has 7-8 properties of different type like String, List, Set. We are able to fetch vertices very quickly however while fetching all its properties it takes considerable amount of time. I had a look at https://github.com/thinkaurelius/titan/issues/1335 & applied the query.fast-property=true. It takes 30 seconds to fetch all properties.

Is there any way to get all properties when we fetch vertices?

amcp commented 7 years ago

@pasalkarsachin1 For only 7-8 small properties, the behavior here will depend on the size of these properties and the data model you chose. How does your RCU consumption on edgestore compare between fast-property=true and fast-property=false? The data model you choose for edge store will also be important. What did you select?

bendavidwhite commented 7 years ago

Hi @amcp

I'm seeing similar performance issues to @pasalkarsachin1 . We have a simple single vertex, single edge model. There are about 400 vertices loaded, each with anywhere between 5 and 100 connected edges.

Queries traversing the network and returning just a list of vertices or edges complete in under a second. However when appending either .valueMap() or .values() to retrieve property values the query time increases by a factor of 5-10 times. There are only two properties and they are both short sub 10 character strings.

I've tried adding query.fast-property=true to dynamodb.properties as well as bumping up the values of the likes of storage.dynamodb.stores.edgestore.initial-capacity-read by a factor of 100. Similarly I've increased the read capacity units for the jg_ tables in the DynamoDB console.

I feel like there's something fundamental I'm missing as when inspecting the item counts of the jg_ tables they are all zero! See below. This is despite always using the :remote command so that requests to add vertices/edges etc are sent to the Gremlin Server.

Thanks in advance for any help! Ben

screen shot 2017-10-21 at 20 03 21
bendavidwhite commented 7 years ago

Hi @pasalkarsachin1

were you able to improve your performance for fetching vertex properties in the end?

bendavidwhite commented 7 years ago

My apologies, the performance issue I was experiencing was due to the EC2 instance being located in ap-south-1 whilst the dynamodb jg_ tables actually used by the JanusGraph backend were in us-west-2. Doh! As suspected, a beginners mistake.