JanusGraph / janusgraph-docker

JanusGraph Docker images
Other
100 stars 87 forks source link

.JanusGraphException: A JanusGraph graph with the same instance id is already open #64

Open winaterion opened 3 years ago

winaterion commented 3 years ago

I deployed Janusgraph:latest container into kubernetes using statefulset. it worked fine in kubernetes Pod, but sometimes it fails without being able to recover, in the below exception I found the line saying:

: org.janusgraph.core.JanusGraphException: A JanusGraph graph with the same instance id [0a050039216-janusgraph-0-janusgraph-service-default-svc-cluster-local1] is already open. Might required forced shutdown.
    at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:173)

I have no idea from where is this instance id comes from. I can't understand why this is happening out of nowhere, my guess is that the pod got restarted or the container crashed maybe, and that made this happend.

this is the startup of Janus log and also the exception during it:

1530 [main] INFO  org.janusgraph.diskstorage.Backend  - Initiated backend operations thread pool of size 2
1590 [main] WARN  org.apache.tinkerpop.gremlin.server.GremlinServer  - Graph [graph] configured at [/etc/opt/janusgraph/janusgraph.properties] could not be instantiated and will not be available in Gremlin Server.  GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:81)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:69)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:103)
    at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.lambda$new$0(DefaultGraphManager.java:57)
    at java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
    at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.<init>(DefaultGraphManager.java:55)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:80)
    at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:122)
    at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:86)
    at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:345)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:77)
    ... 13 more
Caused by: org.janusgraph.core.JanusGraphException: A JanusGraph graph with the same instance id [0a050039216-janusgraph-0-janusgraph-service-default-svc-cluster-local1] is already open. Might required forced shutdown.
    at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:173)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:161)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:132)
    at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:112)
    ... 18 more
1592 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized Gremlin thread pool.  Threads in pool named with pattern gremlin-*
1634 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized GremlinExecutor and preparing GremlinScriptEngines instances.
4816 [gremlin-server-exec-1] ERROR org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager  - Could not create GremlinScriptEngine for gremlin-groovy
java.lang.IllegalStateException: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1
    at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.lambda$createGremlinScriptEngine$16(DefaultGremlinScriptEngineManager.java:464)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
    at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
    at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.createGremlinScriptEngine(DefaultGremlinScriptEngineManager.java:450)
    at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.getEngineByName(DefaultGremlinScriptEngineManager.java:219)
    at org.apache.tinkerpop.gremlin.jsr223.CachedGremlinScriptEngineManager.lambda$getEngineByName$0(CachedGremlinScriptEngineManager.java:57)
    at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
    at org.apache.tinkerpop.gremlin.jsr223.CachedGremlinScriptEngineManager.getEngineByName(CachedGremlinScriptEngineManager.java:57)
    at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:267)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1
    at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:378)
    at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
    at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.lambda$createGremlinScriptEngine$16(DefaultGremlinScriptEngineManager.java:460)
    ... 24 more
Caused by: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1
    at org.apache.tink
farodin91 commented 3 years ago

Instance ids are in default case automatically generated. You could now use graph.replace-instance-if-exists=true to replace an old instance with the same id or you could one of the following configs in combination with the other config to set the instance id by your own:

Does it this help?

kingzbauer commented 2 years ago

@farodin91 This worked for me

porunov commented 2 years ago

With statefulset your hostname might be the same after re-deployment. In case you didn't properly close you JanusGraph instance and simply forced restart the pod then the instance id won't be removed from opened instances set. Thus, you will need to remove zombie instance. You can use the next approach to force close zombie instance (Never close non-zombie instances):

mgnm = janusGraph.openManagement(); 
mgnm.forceCloseInstance(instanceName); 
mgnm.commit();

That said, in your situation, as you are using a statefulset it might be easier for you to use the next configuration:

graph.replace-instance-if-exists=true

With this configuration you won't need to manually force close your zombie instances when you stop your pod non-gracefully. If your pod has the same hostname (or graph.unique-instance-id) after restart - the re-deployed pod with the same name will replace your zombie instance (which eliminates any zombie instances unless your graph id changes for some reason or the amount of pods decreases).

ManfredLange commented 2 years ago

@farodin91 Your suggestion to set use config graph.replace-instance-if-exists=true worked. Thank you, Jan!

I just had to experiment a little to figure how to use your option in a multi-container environment that is set up with a docker-compose file.

Here is what that section in my docker-compose file looks like. Perhaps share this helps others resolve this issue faster:

version: '3.7'

services:
   janusgraph:
      container_name: optarix-janusgraph
      hostname: janusgraph.local
      image: janusgraph/janusgraph:0.6.1
         # Last check of version number on 23 Apr 2022
         # More recent images may be available at https://hub.docker.com/r/janusgraph/janusgraph/tags?page=1&ordering=last_updated
      ports:
         - 8182:8182
      environment:
         # Configuration options for JanusGraph desribed at:
         # https://docs.janusgraph.org/configs/configuration-reference
         - graph.replace-instance-if-exists=true # addresses error described at 

The complete set of JanusGraph config variables that can be passed as environment variables as illustrated in this docker-compose file snipped can be found at https://docs.janusgraph.org/configs/configuration-reference

Note that I didn't always see the reported issue. It showed as an intermittent issue. My hypothesis is that when the docker container shut down, JanusGraph / TinkerPop didn't shut down correctly in all cases. It could be caused by a race condition. Note that I let Docker to initiate the shutdown. In this cases I used docker-compose down which I believe is not considered a forceful shutdown but the regular way of shutting down a container.

I'm wondering if the setting graph.replace-instance-if-exists=true should be the default for a stand-alone container or if that setting is at least mentioned somewhere more prominently. The way the issue presented to me made me assume that the container image is not mature enough so I never moved beyond basic trials. I acknowledge that my view was caused by my lack of knowledge / experience / understanding. I'd love to see graph databases like JanusGraph to become more accessible to a wider audience.

Update 05 June 2022: When providing config parameters as environment variables in a docker-compose file, it's necessary to prefix the environment variable names with "janusgraph." See my later answer with an example of providing such an environment variable correctly.

ManfredLange commented 2 years ago

I tried the same, however, according to the documentation at https://github.com/JanusGraph/janusgraph-docker#configuration you need to use an outer namespace janusgraph as follows:

services:
   janusgraph:
      container_name: optarix-janusgraph
      hostname: janusgraph.local
      image: janusgraph/janusgraph:0.6.1
         # Last check of version number on 23 Apr 2022
         # More recent images may be available at https://hub.docker.com/r/janusgraph/janusgraph/tags?page=1&ordering=last_updated
      ports:
         - 8182:8182
      environment:
         # Configuration options for JanusGraph desribed at:
         # https://docs.janusgraph.org/configs/configuration-reference
         - janusgraph.graph.replace-instance-if-exists=true

With that in place I can then use a terminal into the janusgraph container and then cat the file with the generated properties:

cat /etc/opt/janusgraph/janusgraph.properties

In my case this then yields the following content for that file:

#
# NOTE: THIS FILE IS GENERATED VIA "update.sh"
# DO NOT EDIT IT DIRECTLY; CHANGES WILL BE OVERWRITTEN.
#
# Copyright 2021 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# JanusGraph configuration sample: BerkeleyDB JE and Apache Lucene

gremlin.graph = org.janusgraph.core.JanusGraphFactory

# The primary persistence provider used by JanusGraph.  This is required.  It
# should be set one of JanusGraph's built-in shorthand names for its standard
# storage backends or to the full package and classname of a custom/third-party
# StoreManager implementation.
#
# Default:    (no default value)
# Data Type:  String
# Mutability: LOCAL
storage.backend = berkeleyje

# Storage directory for those storage backends that require local storage.
#
# Default:    (no default value)
# Data Type:  String
# Mutability: LOCAL
storage.directory = /var/lib/janusgraph/data

# The indexing backend used to extend and optimize JanusGraph's query
# functionality. This setting is optional.  JanusGraph can use multiple
# heterogeneous index backends.  Hence, this option can appear more than
# once, so long as the user-defined name between "index" and "backend" is
# unique among appearances.Similar to the storage backend, this should be
# set to one of JanusGraph's built-in shorthand names for its standard
# index backends (shorthands: lucene, elasticsearch, es, solr) or to the
# full package and classname of a custom/third-party IndexProvider
# implementation.
#
# Default:    elasticsearch
# Data Type:  String
# Mutability: GLOBAL_OFFLINE
#
# Settings with mutability GLOBAL_OFFLINE are centrally managed in
# JanusGraph's storage backend.  After starting the database for the first
# time, this file's copy of this setting is ignored.  Use JanusGraph's
# Management System to read or modify this value after bootstrapping.
index.search.backend = lucene

# Directory to store index data locally
#
# Default:    (no default value)
# Data Type:  String
# Mutability: MASKABLE
index.search.directory = /var/lib/janusgraph/index
graph.replace-instance-if-exists=true

Notice the very last entry in this file which is what is configured in the docker-compose file.