JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.28k stars 1.17k forks source link

Error acquiring write lock while committing management transaction with bigtable backend #455

Open amyth opened 7 years ago

amyth commented 7 years ago

Hey, I am using janusgraph with bigtable backend and am trying to create a property key. As soon as I commit this transaction using management.commit(), I get the following error:

org.janusgraph.core.JanusGraphException: Could not commit transaction due to exception during persistence
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1374)
    at org.janusgraph.graphdb.database.management.ManagementSystem.commit(ManagementSystem.java:235)
    at org.janusgraph.core.schema.JanusGraphManagement$commit$0.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:117)
    at groovysh_evaluate.run(groovysh_evaluate:3)
    at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
    at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:70)
    at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:190)
    at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)
    at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1215)
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
    at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)
    at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)
    at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)
    at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1215)
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)
    at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:124)
    at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)
    at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
    at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1215)
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
    at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)
    at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:83)
    at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
    at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:152)
    at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
    at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:455)
Caused by: org.janusgraph.core.JanusGraphException: Unexpected exception
    at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:798)
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1363)
    ... 47 more
Caused by: org.janusgraph.diskstorage.locking.TemporaryLockingException: Temporary locking failure
    at org.janusgraph.diskstorage.locking.AbstractLocker.writeLock(AbstractLocker.java:309)
    at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStore.acquireLock(ExpectedValueCheckingStore.java:103)
    at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.acquireLock(KCVSProxy.java:52)
    at org.janusgraph.diskstorage.BackendTransaction.acquireIndexLock(BackendTransaction.java:255)
    at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:565)
    at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:694)
    ... 48 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Lock write retry count exceeded
    at org.janusgraph.diskstorage.locking.consistentkey.ConsistentKeyLocker.writeSingleLock(ConsistentKeyLocker.java:339)
    at org.janusgraph.diskstorage.locking.consistentkey.ConsistentKeyLocker.writeSingleLock(ConsistentKeyLocker.java:123)
    at org.janusgraph.diskstorage.locking.AbstractLocker.writeLock(AbstractLocker.java:304)
    ... 53 more

How to reproduce

To reproduce this, just get the janusgraph up and running with bigtable storage backend and do the following within gremlin shell:

graph = JanusGraphFactory.open('conf/yourconf.properties')
management = graph.openManagement()
uid = management.makePropertyKey("uid").dataType(String.class).make()
management.commit()

Update: Version & Config Information

Configuration is as follows:

storage.backend=hbase

## Google cloud BIGTABLE configuration options
storage.hbase.ext.hbase.client.connection.impl=com.google.cloud.bigtable.hbase1_x.BigtableConnection
storage.hbase.ext.google.bigtable.project.id=project-id
storage.hbase.ext.google.bigtable.instance.id=instance-id

storage.hostname=localhost
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
korjavin commented 4 years ago

any changes on this?

sparshneel commented 4 years ago

any update here?? I am getting exactly the same exception

porunov commented 4 years ago

The issue is related to lock write time being more than 100ms. Most likely your gremlin server is located too far from bigtable servers. Please, change 'storage.lock.wait-time' to a bigger value (maybe something like 5000).

@mbrukman Do you think we should increase the default value to a bigger value (like 1-2 seconds instead of 100 ms) so that new users would not get this common exception?

sparshneel commented 4 years ago

@porunov I tried setting it to 5000. Its still not working, now i get the exception below org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT1M40S

By the way i am using cassandra as backend store and elastic for index store see the configuration below

tx.log-tx=true tx.max-commit-time=15000

Metrics

metrics.enabled=False metrics.jmx.enabled=False

cluster

cluster.max-partitions=32

locking

storage.lock.expiry-time=3000

storage.lock.wait-time=5000

storage.lock.retries=10

ids

ids.block-size=100000

porunov commented 4 years ago

Try to check that "repeated temporary exceptions". Maybe there are some information. Hard to answer without seeing the whole picture.

porunov commented 4 years ago

That being said. You "repeated temporary exceptions" are not related to this issue. Try to find help on users google group: https://groups.google.com/forum/#!forum/janusgraph-users

sparshneel commented 4 years ago

@porunov thanks it worked. Lock expiry time was not set correctly after i set wait time to 5000

mbrukman commented 4 years ago

@porunov wrote:

@mbrukman Do you think we should increase the default value to a bigger value (like 1-2 seconds instead of 100 ms) so that new users would not get this common exception?

@kolea2 and @igorbernstein2 are looking into issues with Cloud Bigtable driver for JanusGraph; I'll let them comment on what they think is the best approach here.

igorbernstein2 commented 4 years ago

I'm not very familiar with janus graph, but have a bit of knowledge on the java client. What operation does this timeout map on to?

porunov commented 4 years ago

I'm not very familiar with janus graph, but have a bit of knowledge on the java client. What operation does this timeout map on to?

It's time to wait to lock to be written on a storage backend. It is used when you are defining your schema (creating new vertex labels, edge labels, properties keys, indices). Also, I think it may be used when you dealing with locks in general (but not sure on this one). For example, when you have an index with unique key but your storage backend is eventually consistent. For eventually consistent storage backends JanusGraph acquires a lock when writing new data which is used by an index with unique property defined.