Open xdev-developer opened 7 years ago
Do you mean related to a different issue? This issue is #121
Oh, sorry wrong issue number. Comment edited. After debugging of testConcurrentConsistencyEnforcement, I can say, this is same problem.
For reference, this happened in our deployed code also.
Do we think this issue is specific to BerkeleyDB or might it also exist with Cassandra?
It exists with Cassandra too. I have 3 times more vertices than I have unique data. Before I have created a unique index and added LOCK consistency to it. But on Spark, 96 Threads did not care about it.
Hi Yeah, We are facing too same issue when vertex creation is running parallel in cassandra. Please let us know, why unique index is not working in parallel case. Is this issue is fixed?
This problem also happened when i use Hbase as backend storage.
same issue any ideas or updates??
i have the same problem when i use spark to insert vertex .Hbase is the backend storage.
Has anybody found a workaround ? I don't think that checking the existence of the vertex before creation is a valid solution.
Berkeleydb just not implement locking mechanism but enable locking feature. And berkeleydb has not test suites for LockKeyColumnValueStoreTest
JG has ExpectedValueCheckingStore that's implement locking for store without locking support. Default locking implementation is redundant for local stores
- Locking is done in two stages: first between threads inside a shared process,
- and then between processes in a JanusGraph cluster.
Some example of only inter-process locking mechanism implemented here
Hi I am also facing the same issue where unique index constraint is not working. And result is multiple vertices are getting created. But the index contains only one vertex. when I try to delete the vertices which are not in index it throws follwoijng exception: Caused by: org.janusgraph.diskstorage.locking.PermanentLockingException: Expected value mismatch for KeyColumn [k=0x 76- 15-191-228- 29-137-160- 79-114- 97- 99-108-101- 95- 66- 68- 77- 58- 47- 47- 81- 65- 49- 50- 67- 82- 49- 47- 83- 71- 83- 65- 77- 80- 76- 69- 68- 66- 47-114-101-102-116- 97-110-118-105- 47- 69- 77- 80- 78- 65- 77-197, c=0x 0]: expected=[ 24- 30- 57-200] vs actual=[ 17- 35- 28-152] (store=graphindex) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction.checkSingleExpectedValueUnsafe(ExpectedValueCheckingTransaction.java:289) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction.access$000(ExpectedValueCheckingTransaction.java:50) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction$1.call(ExpectedValueCheckingTransaction.java:227) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction$1.call(ExpectedValueCheckingTransaction.java:224) at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction.checkSingleExpectedValue(ExpectedValueCheckingTransaction.java:224) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction.checkAllExpectedValues(ExpectedValueCheckingTransaction.java:193) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingTransaction.prepareForMutations(ExpectedValueCheckingTransaction.java:158) at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(ExpectedValueCheckingStoreManager.java:72) Is there a way to delete the vertices which are not there in index?
Talking about Cassandra here (not familiar with BerkeleyDB):
Concurrent vertex creation should not be an issue if all competing transactions are on the same JVM due to the inter-thread locking mechanism. Concurrent vertex creation can lead to inconsistencies if competing transactions are on different JVM instances because the inter-process locking heavily relies on the data storage itself. Competing transactions basically try writing an entry to Cassandra to claim the lock. Due to the eventual consistency characteristics of Cassandra itself, it might be the case (just being hypothetical here) that more than one transaction thinks itself has got the lock, and then it goes ahead with the vertex creation, leading to data duplicates.
Note that the document has suggested that
The locking implementation is not robust against all failure scenarios. For instance, when a Cassandra cluster drops below quorum, consistency is no longer ensured. Hence, it is suggested to use locking-based consistency constraints sparingly with eventually consistent storage backends.
I am also facing the same issue where unique index constraint is not working. And result is multiple vertices are getting created. But the index contains only one vertex
Hi @priyanka211 Can you give some sample code and context of how you create the vertices? Is it the case that you have multiple vertices with the same property value (which should be unique), but an index query returns only one vertex?
Hi all. The same issue =( Console output for info:
gremlin> g.V().has('Skill', 'skill_title', 'foo').elementMap()
==>{id=32808, label=Skill, skill_unique_names=foo, skill_id=e2d7a4f4-436a-44a1-b1c3-51fc5ed2037e, skill_title=foo}
gremlin> g.V().has('Skill', 'skill_title', 'Foo').elementMap()
==>{id=28712, label=Skill, skill_unique_names=foo, skill_id=14d25545-9d3e-4e8a-bd5a-2f689c4c48ea, skill_title=Foo}
gremlin> g.V().has('Skill', 'skill_unique_names', 'foo').elementMap()
==>{id=28712, label=Skill, skill_unique_names=foo, skill_id=14d25545-9d3e-4e8a-bd5a-2f689c4c48ea, skill_title=Foo}
graph.openManagement().printSchema()
...
Vertex Index Name | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
...
skillByTitleExact | Composite | true | internalindex | skill_title: ENABLED |
skillByUniqueNamesExact | Composite | true | internalindex | skill_unique_names: ENABLED | <-- set cardinality
I reproduce the bug with topologies:
Workflow is quite simple: rapid web requests -(n requsts)-> remote client -(n requests)-> Janus
The requests are unrelated, so I can't use some simple batching (maybe if all the requests were batched in one transaction there won't be such an issue).
Sample code, how the vertex created:
public Vertex save(@Nonnull Context context) {
final String label = context.property(LABEL_KEY);
final Tuple<String, ?> id = requireNonNull(context.property(ID_KEY));
try (GraphTraversal<?, Vertex> traversal = traversalSource.addV(label).property(id.getKey(), id.getValue())) {
final Map<String, Object> vertexProperties = context.vertexProperties();
vertexProperties.forEach((key, value) -> setProperty(traversal, key, value));
return traversal.next();
}
}
setProperty
just perform some simple validation/cardinality logic.
@egetman So you are able to reproduce the problem with single JanusGraph instance and single Cassandra? That sounds abnormal. Did you enable locking?
@li-boxuan no, I didn't.
@egetman That's the problem - you should enable locking. See https://docs.janusgraph.org/advanced-topics/eventual-consistency/#data-consistency
@li-boxuan Thanks! =) It was my bad - misconfiguration...
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7fec1b9d
gremlin> skillByUN = mgmt.getGraphIndex("skillByUniqueNamesExact")
==>skillByUniqueNamesExact
gremlin> mgmt.setConsistency(skillByUN, ConsistencyModifier.LOCK)
==>null
gremlin> mgmt.commit()
==>null
This update resolves the issue.
How does it work for Cassandra type of dbs? What lock mechanism is used to achieve this?
Hi all. Unique index not working when run vertex creation in parallel.
Result:
This additional vertex cannot be removed, operation failed with status NOT_FOUND.