amazon-archives / dynamodb-janusgraph-storage-backend

The Amazon DynamoDB Storage Backend for JanusGraph
Apache License 2.0
447 stars 99 forks source link

Ghost Vertex #270

Closed pasalkarsachin1 closed 6 years ago

pasalkarsachin1 commented 6 years ago

Hi,

I am using JanusGraph with DynamoDB as back-end. I have started observing the ghost vertex, when I opened graph & executed below gremlin query for vertex id, it didn't return anything

plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.tinkergraph
gremlin> g =GraphFactory.open('/home/user/janusgraph-0.2.0-hadoop2/bin/aws.properties')
==>standardjanusgraph[com.amazon.janusgraph.diskstorage.dynamodb.DynamoDBStoreManager:[127.0.0.1]]
gremlin> a=g.traversal()
==>graphtraversalsource[standardjanusgraph[com.amazon.janusgraph.diskstorage.dynamodb.DynamoDBStoreManager:[127.0.0.1]], standard]
gremlin> a.V(326859528)
gremlin> 

But when I query the edge which has vertex id as 326859528, I see there are multiple edges

gremlin> a.V().bothE().where(otherV().hasId(326859528))
10:55:43 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>e[5xlkox-5elqfc-q39-5drgg8][326859528-hasValue->325446920]
==>e[5yow69-5elqfc-q39-5ozlds][326859528-hasValue->344302336]
==>e[5yowkh-5elqfc-q39-5pdm9s][326859528-hasValue->344956672]
==>e[5xq5a9-5elqfc-q39-5p1ji8][326859528-hasValue->344393216]
==>e[5yowbl-5elqfc-q39-5rat4w][326859528-hasValue->348184832]
==>e[5yowf5-5elqfc-q39-5oensw][326859528-hasValue->343325696]

Now I query for existence of vertex again, it shows the vertex now 🤔

gremlin> a.V(326859528)
==>v[326859528]

So I tried to commit transaction so that newly created vertex will be saved

gremlin> g.tx().isOpen()
==>true
gremlin> g.tx().commit()
==>null
gremlin> g.tx().isOpen()
==>false
gremlin> 

Then exited from Gremlin & restarted session but vertex is gone again.

plugin activated: tinkerpop.tinkergraph
gremlin> g =GraphFactory.open('/home/user/janusgraph-0.2.0-hadoop2/bin/aws.properties')
==>standardjanusgraph[com.amazon.janusgraph.diskstorage.dynamodb.DynamoDBStoreManager:[127.0.0.1]]
gremlin> a=g.traversal()
==>graphtraversalsource[standardjanusgraph[com.amazon.janusgraph.diskstorage.dynamodb.DynamoDBStoreManager:[127.0.0.1]], standard]
gremlin> a.V(326859528)
gremlin> 

Even I exported graph & imported it in different system but it fails with below error

[ERROR] 2018-05-07 16:57:15.316 [https-jsse-nio-8443-exec-6] c.s.p.e.h.ExceptionControllerAdvice - Internal Exception occurred while serving request {}
java.lang.IllegalStateException: Could not find outV with id [326859528] to create edge with id [5yowdd-5elqfc-q39-5p1m9s]
    at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoReader.lambda$null$2(GryoReader.java:107) ~[gremlin-core-3.2.3.jar:3.2.3]
    at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[?:1.8.0_102]
    at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoReader.lambda$readGraph$3(GryoReader.java:100) ~[gremlin-core-3.2.3.jar:3.2.3]
    at java.util.HashMap$EntrySet.forEach(HashMap.java:1043) ~[?:1.8.0_102]
    at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoReader.readGraph(GryoReader.java:100) ~[gremlin-core-3.2.3.jar:3.2.3]

CC: @amcp

amcp commented 6 years ago

Hi Sachin, Please read the following two documents: http://docs.janusgraph.org/latest/common-questions.html#_ghost_vertices http://docs.janusgraph.org/latest/eventual-consistency.html#ghost-vertices To sum up, ghost vertexes will appear in some storage backends (as defined by JanusGraph), and DynamoDB is no exception. You need to prevent them using key uniqueness (expensive), periodically sweep and remove them, or use soft deletes (recommended) and filter them out at the traversal level. You can combine the second and third approaches to delete with soft deletes and periodically sweep and remove the logically deleted entities. Thank you Alex

pasalkarsachin1 commented 6 years ago

@amcp I see the class as GhostVertexRemover in JanusGraph, is it useful? Is there any documentation around it?