debug-ito / greskell

Haskell binding for Gremlin graph query language
https://hackage.haskell.org/package/greskell
27 stars 4 forks source link

Indexing a graph in Gremlin Server #3

Closed JeffreyBenjaminBrown closed 5 years ago

JeffreyBenjaminBrown commented 5 years ago

I want to manipulate a graph hosted by Gremlin Server. Greskell will let me do most things, and I've gotten to hello world with it, so to speak -- when I start Gremlin Server in a Docker container, I can interact with it from GHCI. (The README lists the steps I take to do that.)

Greskell, however, seems to provide no indexing facilities. (Nor does Gremlin itself provide abstractions for indexing, but Gremlin does provides plugins for specific implementations, e.g. Neo4j.)

If I've got a graph hosted by Gremlin Server, and I want to use Greskell (at least most of the time) to manipulate it, is there a way to index certain properties?

Hackage offers hasbolt for manipulating neo4j graphs directly. If I were to use that, I would have to somehow tell Gremlin Server to use Neo4j (I don't know what it uses by default), and then point hasbolt at Gremlin Server, which for all I know doesn't even make sense.

(This might be TMI, but I tried to make a custom Docker image for Gremlin Server that includes a few neo4j config files borrowed from another project. The result ought to be launchable via /opt/gremlin-server/bin/gremlin-server.sh /mnt/config/gremlin-server.yaml, but when I do that I get parse errors.)

JeffreyBenjaminBrown commented 5 years ago

I just found some neo4j scripts in the Docker image for Grmelin Server. I'll see if I can get this working tonight after work.

debug-ito commented 5 years ago

Thanks for bringing this topic.

I've decided that it is out of scope of Greskell to make indexes in the graph database. This is because Tinkerpop Gremlin provides no standardized way to configure indexes (as far as I know), as you already pointed out. Because Greskell is a binding of Gremlin, Greskell does not have that feature either.

JeffreyBenjaminBrown commented 5 years ago

I am led to believe, largely by Joshua Shinavier (one of Tinkerpop's authors), that indexing can be made automatic by setting up a neo4j.properties file. The file neo4j-empty.properties can be found in a few places, notably in the Docker image for gremlin-server (in /opt/gremlin/conf). It doesn't show how to specify what to index on, but Josh thinks it'll index stuff as long as auto_indexing=true, even if you don't specify which indexes to make. I'm guessing that under those conditions it'll make all the single-key indices possible and no others.

I'm in the middle of figuring out how to enable the format_migration and store_upgrade options, which are not part of the default .properties file. My efforts, in case they are of interest to future readers of this issue, can be found here. (Alternatively, here's a permanent link to the latest commit of that work, but it's likely to improve in the near future.)

JeffreyBenjaminBrown commented 5 years ago

Joshua Shinavier thinks TinkerPop 4 should be out before the end of the year, and says it will have native support for indexing.