espeed / bulbs

A Python persistence framework for graph databases like Neo4j, OrientDB and Titan.
http://bulbflow.org
Other
622 stars 83 forks source link

Support for Key Indexes? #112

Open dlcarmean opened 10 years ago

dlcarmean commented 10 years ago

Manual vertex/edge indices seem to be deprecated in most (all?) of the graph engines these days; are there any plans yet to refactor bulbs to, at the least, add support for them? (I'd also like to see the term 'automatic' de-confounded; currently in bulbs, "auto*" means "automagically update the Manual indices.. either the default or a Manual index created by adding an Element Proxy".)

I added some defs to bulbs/gremlin.groovy based on the existing get_or_create... but once I started looking at the corresponding changes needed in the python code... I thought I'd better ask about "plans" before I put too much effort in.

espeed commented 10 years ago

Cursory Key Index support was added to Bulbs Titan a few months ago.

See...

Prior to this year, all of the Blueprints/Neo4 Java autoindex implementations used manual indices under the hood.

Manual indices are definitely going away in TinkerPop 3, which is undergoing a massive redesign and is scheduled for summer 2014.

Before we switch Bulbs over to the new autoindices, we need to nail down how we are going to handle composite indices so that when you look up a name using a Person model you don't get results that include names from a Pet model.

Prefixing keys with the model's element_type would work, but it's clunky and would break existing code.

Marko has suggested something like this:

g.V.has('element_type', 'person').has('name', 'marko')

However, if the underlying DB does not support composite indices, it could result in a linear scan.

See https://groups.google.com/d/msg/gremlin-users/G8Bpohk9GwA/AonV5izsvt4J

DB support for composite indices is still in a state of flux, and I'm not even sure Matthias has nailed down how it's going to work in Titan (and that was the DB where Key Indices originated).

Also, Neo4j Server is moving away from Gremlin to Cypher so we need to find a way to make this work in Cypher too.

What server and DB are you using?

dlcarmean commented 10 years ago

Right now, all of my graphdb work is just exploratory, and I've been experimenting with various dbs/servers: neo4j server, neo4j and tinkergraph Embedded and also through Rexster Server--all through gremlin/groovysh for the past couple of weeks.

Barring adding implicit magic to g.V and g.E to allow calling them with key/value lookups as in Gremlin, if the g.V.has(k, v) were transformed into the 'best available' method on the back end (key index lookup or the next best thing) that would probably work fine. I guess I'd want to document that magic was taking place and perhaps add some info-level logging to that effect, but .. settling on the simplest API would be good.

At this point I'm starting to think that I should bite the bullet and use the current uncertainty to force myself to become fluent in Java, since Groovy is showing itself to be a great introduction to it. Frames seems to be roughly equivalent to Bulbs in that respect and it's looking fairly easy to mix a Groovy/Gremlin DSL in with that.

mariomcginley commented 10 years ago

Any progress on this issue? I tried using the latest bulbs with titan-server-0.4.0. I'm getting a warning on a g.query(): [WARN] StandardTitanTx$5 - Query requires iterating over all vertices [(v[106] = accountANDv[146] = test)]. For better performance, use indexes And an error on a g.query().has("title",CONTAINS,"test"): java.lang.IllegalArgumentException: Data type of key is not compatible with condition "title" is a String() on a Node model. Thanks.

espeed commented 10 years ago

@mariomcginley make sure you create the Titan indices before you add data.

You can create a basic KeyIndex through Rexster (Bulbs provides an interface for that https://gist.github.com/espeed/3938820).

For custom index options, use a Gremlin script or the Gremlin REPL to create the index. See...