JanusGraph / janusgraph-docker

JanusGraph Docker images
Other
98 stars 87 forks source link

Unable to set gremlin server options that are not simple types (e.g. 'graphs') #44

Closed mkaisercross closed 3 years ago

mkaisercross commented 4 years ago

I'm trying to setup the JG docker image to use ConfiguredGraphFactory. Not sure if I am doing it correctly but anyway it seems there is an issue with the docker configuration options that aren't simple strings.

docker run --rm -it -e gremlinserver.graphs={ConfigurationManagementGraph:/etc/opt/janusgraph/janusgraph.properties} janusgraph/janusgraph:0.4.0 janusgraph show-config

See the graphs object in the output.

Output

# contents of /etc/opt/janusgraph/janusgraph.properties
#
# NOTE: THIS FILE IS GENERATED VIA "update.sh"
# DO NOT EDIT IT DIRECTLY; CHANGES WILL BE OVERWRITTEN.
#
# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=/var/lib/janusgraph/data
index.search.backend=lucene
index.search.directory=/var/lib/janusgraph/index
---------------------------------------
# contents of /etc/opt/janusgraph/gremlin-server.yaml
# Copyright 2019 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {ConfigurationManagementGraph:/etc/opt/janusgraph/janusgraph.properties}
  graph: /etc/opt/janusgraph/janusgraph.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536

threadPoolWorker: 1
gremlinPool: 8
mkaisercross commented 4 years ago

What about using something like yq YAML Processor

FlorianHockmann commented 4 years ago

Looks like modifying nested values in the Gremlin Server yaml is currently really not supported. Such a YAML processer sounds like a good idea to me.

@nvizo Do you want to provide a pull request for this?

@sjudeng Do you have any additional insights here since you added the configuration parsing if I remember it correctly?

mkaisercross commented 4 years ago

@FlorianHockmann I'd be happy to help with this. It looks like yq isn't available in the openjdk (alpine) image and is not available in the alpine package manager so the installation would have to be from source or prebuilt binaries. It looks like they have prebuilt binaries hosted on Github. Downloading a specific architecture binary would add the least amount of time to the build time while keeping the final image as small as possible. I will look into this to see what is possible.

FlorianHockmann commented 4 years ago

Great to hear that you're planning to contribute. Just note that you need to sign the contributor license agreement (CLA) if you haven't already, but that is described in CONTRIBUTING.md together with a few other aspects of the contributing workflow.

is not available in the alpine package manager

While there are plans to move to an Alpine based image (see #27), we currently use the Debian based image openjdk:8-jdk. The alpine alternative would be openjdk:8-jdk-alpine if I'm not mistaken. So this could make it easier to add this dependency.

mkaisercross commented 4 years ago

Thanks for the info on contributing and openjdk. I thought I saw somewhere that openjdk was an alpine based image but I see now it's Debian. Do you think it is better to use apt or to download the binaries directly. I already implemented it already using the latter but it's not a big deal to switch to apt. I'm thinking I probably should because if we support other architectures in the future it may cause an issue.

Should I update all version directories with the changes i.e. 0.2, 0.3, 0.4? It is a bug fix and the changes should be backwards compatible since it is basically a superset of the same syntax so I am thinking yes but let me know if you disagree.

Overall the changes make it a lot easier to edit the configs. For example here is a somewhat complicated config I was able to test that works:

docker run --rm -it -e gremlinserver.scriptEngines.gremlin-groovy.plugins["org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin"].files[+]=scripts/another-script.groovy janusgraph/janusgraph:0.4.0 janusgraph show-config

Output

# contents of /etc/opt/janusgraph/janusgraph.properties
storage:
  directory: /var/lib/janusgraph/data
index:
  search:
    directory: /var/lib/janusgraph/index
---------------------------------------
# contents of /etc/opt/janusgraph/gremlin-server.yaml
host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs:
  graph: conf/gremlin-server/janusgraph-cql-es-server.properties
scriptEngines:
  gremlin-groovy:
    plugins:
      org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {}
      org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {}
      org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {}
      org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin:
        classImports:
        - java.lang.Math
        methodImports:
        - java.lang.Math#*
      org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin:
        files:
        - scripts/empty-sample.groovy
        - scripts/another-script.groovy
serializers:
- className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
  config:
    ioRegistries:
    - org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry
- className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
  config:
    serializeResultToString: true
- className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
  config:
    ioRegistries:
    - org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry
- className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
  config:
    ioRegistries:
    - org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry
- className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0
  config:
    ioRegistries:
    - org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry
- className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
  config:
    serializeResultToString: true
- className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0
  config:
    ioRegistries:
    - org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry
- className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0
  config:
    ioRegistries:
    - org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0
- className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0
  config:
    ioRegistries:
    - org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0
processors:
- className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor
  config:
    sessionTimeout: 28800000
- className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor
  config:
    cacheExpirationTime: 600000
    cacheMaxSize: 1000
metrics:
  consoleReporter:
    enabled: true
    interval: 180000
  csvReporter:
    enabled: true
    interval: 180000
    fileName: /tmp/gremlin-server-metrics.csv
  jmxReporter:
    enabled: true
  slf4jReporter:
    enabled: true
    interval: 180000
  gangliaReporter:
    enabled: false
    interval: 180000
    addressingMode: MULTICAST
  graphiteReporter:
    enabled: false
    interval: 180000
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
threadPoolWorker: 1
graph: /etc/opt/janusgraph/janusgraph.properties
gremlinPool: 8

Regarding the updates to the README, I made a very minimal addition to the table that shows how to update nested values. But I think it may be good to either reference the yq project and/or create a new section in the README that describes some more advanced configurations. For example if someone wants to append a new serializer to the config you cannot use the [+] syntax because the item has multiple properties. Instead you must specify the index manually in two different env vars like so:

docker run --rm -it -e gremlinserver.serializers[10].config=test -e gremlinserver.serializers[10].className=test janusgraph/janusgraph:0.4.0 janusgraph show-config

Do you think we should create a section in our documentation or just simply add a reference to the yq project?

mkaisercross commented 4 years ago

@FlorianHockmann I think I have completed the fix for this issue. I created a pull request and followed all the instructions in the CONTRIBUTING.md. TravisCI is passing but there is one remaining Codacy issue that look like a false positive. I am not sure how to fix it since the code it is flagging is necessary and the issue reported (reference before assignment) is not actually happening. Can this be manually overridden?

FlorianHockmann commented 4 years ago

Do you think it is better to use apt or to download the binaries directly. I already implemented it already using the latter but it's not a big deal to switch to apt. I'm thinking I probably should because if we support other architectures in the future it may cause an issue.

I personally would be OK with both approaches but I have a slight preference for the apt approach as it's probably more stable and consistent with the way we install dependencies in general. Regarding other architectures: I wouldn't use that as a deciding factor for your PR now as we don't even know if and when we will support other architectures. But I see that you already used apt in your PR now.

I suggest that we move the discussion on the concrete contribution now to the PR where I also already commented on this:

Should I update all version directories with the changes i.e. 0.2, 0.3, 0.4?

farodin91 commented 3 years ago

Fixed with PR #55