amazon-archives / dynamodb-janusgraph-storage-backend

The Amazon DynamoDB Storage Backend for JanusGraph
Apache License 2.0
446 stars 99 forks source link

Lots of storage-exceptions during heavy upserts in dynamodb #204

Open miteshvp opened 7 years ago

miteshvp commented 7 years ago

Hi, We are facing lots of storage exceptions when system is under load. Our gremlin query checks if a vertex exists to update the property or create a new one with property. We are trying some performance benchmark for SINGLE vs MULTI data-model. During test runs for MULTI data-model, it went fine without any issues. Using SINGLE data-model we see improvements in overall response times but at the same time lots of storage related exceptions (parallelMutate). Is there any solution or work-around?

105219 [gremlin-server-exec-3] ERROR com.thinkaurelius.titan.graphdb.database.StandardTitanGraph  - Could not commit transaction [99] due to storage exception in commit
com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:87)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.java:141)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.commit(CacheTransaction.java:198)
    at com.thinkaurelius.titan.diskstorage.BackendTransaction.commitStorage(BackendTransaction.java:119)
    at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.commit(StandardTitanGraph.java:718)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1352)
    at com.thinkaurelius.titan.graphdb.tinkerpop.TitanBlueprintsGraph$GraphTransaction.doCommit(TitanBlueprintsGraph.java:263)
    at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:105)
    at org.apache.tinkerpop.gremlin.server.GraphManager.lambda$commitAll$2(GraphManager.java:122)
    at java.util.concurrent.ConcurrentHashMap$EntrySetView.forEach(ConcurrentHashMap.java:4795)
    at org.apache.tinkerpop.gremlin.server.GraphManager.commitAll(GraphManager.java:119)
    at org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler.attemptCommit(HttpGremlinEndpointHandler.java:476)
    at org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler.lambda$channelRead$1(HttpGremlinEndpointHandler.java:245)
    at org.apache.tinkerpop.gremlin.util.function.FunctionUtils.lambda$wrapFunction$0(FunctionUtils.java:36)
    at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$2(GremlinExecutor.java:293)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Permanent exception while executing backend operation CacheMutation
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
    ... 19 more
Caused by: com.amazon.titan.diskstorage.dynamodb.BackendRuntimeException: was interrupted during parallelMutate
    at com.amazon.titan.diskstorage.dynamodb.DynamoDBDelegate.parallelMutate(DynamoDBDelegate.java:301)
    at com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreManager.mutateMany(DynamoDBStoreManager.java:194)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:90)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:87)
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
    ... 20 more
105219 [gremlin-server-exec-3] ERROR com.thinkaurelius.titan.graphdb.database.StandardTitanGraph  - Could not commit transaction [99] due to exception
com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:87)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.java:141)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.commit(CacheTransaction.java:198)
    at com.thinkaurelius.titan.diskstorage.BackendTransaction.commitStorage(BackendTransaction.java:119)
    at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.commit(StandardTitanGraph.java:718)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1352)
    at com.thinkaurelius.titan.graphdb.tinkerpop.TitanBlueprintsGraph$GraphTransaction.doCommit(TitanBlueprintsGraph.java:263)
    at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:105)
    at org.apache.tinkerpop.gremlin.server.GraphManager.lambda$commitAll$2(GraphManager.java:122)
    at java.util.concurrent.ConcurrentHashMap$EntrySetView.forEach(ConcurrentHashMap.java:4795)
    at org.apache.tinkerpop.gremlin.server.GraphManager.commitAll(GraphManager.java:119)
    at org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler.attemptCommit(HttpGremlinEndpointHandler.java:476)
    at org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler.lambda$channelRead$1(HttpGremlinEndpointHandler.java:245)
    at org.apache.tinkerpop.gremlin.util.function.FunctionUtils.lambda$wrapFunction$0(FunctionUtils.java:36)
    at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$2(GremlinExecutor.java:293)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Permanent exception while executing backend operation CacheMutation
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
    ... 19 more
Caused by: com.amazon.titan.diskstorage.dynamodb.BackendRuntimeException: was interrupted during parallelMutate
    at com.amazon.titan.diskstorage.dynamodb.DynamoDBDelegate.parallelMutate(DynamoDBDelegate.java:301)
    at com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreManager.mutateMany(DynamoDBStoreManager.java:194)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:90)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:87)
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
    ... 20 more
105220 [gremlin-server-exec-3] WARN  org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler  - Invalid request - responding with 500 Internal Server Error and was interrupted during parallelMutate
com.amazon.titan.diskstorage.dynamodb.BackendRuntimeException: was interrupted during parallelMutate
    at com.amazon.titan.diskstorage.dynamodb.DynamoDBDelegate.parallelMutate(DynamoDBDelegate.java:301)
    at com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreManager.mutateMany(DynamoDBStoreManager.java:194)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:90)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:87)
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56)
    at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:87)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.java:141)
    at com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.commit(CacheTransaction.java:198)
    at com.thinkaurelius.titan.diskstorage.BackendTransaction.commitStorage(BackendTransaction.java:119)
    at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.commit(StandardTitanGraph.java:718)
    at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1352)
    at com.thinkaurelius.titan.graphdb.tinkerpop.TitanBlueprintsGraph$GraphTransaction.doCommit(TitanBlueprintsGraph.java:263)
    at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:105)
    at org.apache.tinkerpop.gremlin.server.GraphManager.lambda$commitAll$2(GraphManager.java:122)
    at java.util.concurrent.ConcurrentHashMap$EntrySetView.forEach(ConcurrentHashMap.java:4795)
    at org.apache.tinkerpop.gremlin.server.GraphManager.commitAll(GraphManager.java:119)
    at org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler.attemptCommit(HttpGremlinEndpointHandler.java:476)
    at org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler.lambda$channelRead$1(HttpGremlinEndpointHandler.java:245)
    at org.apache.tinkerpop.gremlin.util.function.FunctionUtils.lambda$wrapFunction$0(FunctionUtils.java:36)
    at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$2(GremlinExecutor.java:293)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
108212 [gremlin-server-worker-1] ERROR org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler  - Error processing HTTP Request
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
amcp commented 7 years ago

What could be happening is that your graph has too large a degree (too many average properties/edges per vertex) for the SINGLE item data model. SINGLE item data model is meant for graphs with low degree. See the tests testEdgesExceedCacheSize and testVertexCentricQuery in SingleDynamoDBGraphTest.java for examples of this limitation in play.

miteshvp commented 7 years ago

If it helps, I have two nodes as below and one outgoing edge between them. So I highly doubt I have a large degree per vertex.

Item Node
{
    "requestId": "c01a736a-7605-4656-b27e-f7839ae60927",
    "status": {
        "message": "",
        "code": 200,
        "attributes": {}
    },
    "result": {
        "data": [{
            "ilast_year_closed": [18],
            "plast_year_closed": [14],
            "ilast_month_opened": [17],
            "last_updated": ["never"],
            "ecosystem": ["test"],
            "plast_month_closed": [15],
            "vertex_label": ["Item"],
            "plast_year_opened": [12],
            "ilast_year_opened": [16],
            "ilast_month_closed": [19],
            "latest": ["x.x.y"],
            "name": ["cdcb0378-a7de-4b2e-a079-fe48929164fb2351"],
            "stars": [21],
            "relative_used": ["not used"],
            "spn": [20],
            "deps_count": [-1],
            "plast_month_opened": [13]
        }],
        "meta": {}
    }
}

Desc Node
{
    "requestId": "aabab6c9-391e-4a62-9043-01d7d69c544f",
    "status": {
        "message": "",
        "code": 200,
        "attributes": {}
    },
    "result": {
        "data": [{
            "last_updated": ["never"],
            "shipped": [false],
            "pname": ["e23f7f9c-4940-4ffe-8e2b-fa5370a0041c362"],
            "vertex_label": ["Desc"],
            "description": ["No Description"],
            "cveds": ["2014-15:10"],
            "version": ["x.x.x"],
            "deps_count": [-1],
            "licenses": ["NOLIC"],
            "complexity": [1],
            "pecosystem": ["test"],
            "locs": [100],
            "file": [1024]
        }],
        "meta": {}
    }
}