ShiftLeftSecurity / overflowdb

ShiftLeft OverflowDB
Apache License 2.0
112 stars 21 forks source link

TinkerPop3 Module? #289

Open zhangziqiang1 opened 2 years ago

zhangziqiang1 commented 2 years ago

image The collection data cannot be queried because only the collection data is returned. The collection list cannot be queried using the internal data of the collection.

Line 95 of the code, can you consider the scenario data of the list and return the iterator of the set? The processing method is similar to the following: image

mpollmeier commented 2 years ago

We just retired the tinkerpop wrappers to free up the path for a new database backend implementation. If there's demand and/or other contributors/maintainers, we may bring it back later. Would you be willing to help?

jeremysears commented 2 years ago

Hi, Michael. Thank you to you and your team for all of your contributions to this ecosystem. We are considering moving on to overflowdb, and if we do, we would be willing to take on maint for the TP3 wrappers in the next few months. That would also free us up to get on the latest version of your gremlin-scala repo, which would allow us to contribute the last year or more of enhancements that we haven't been able to push upstream. Is there any chance you could help me to understand more about the following questions?

mpollmeier commented 2 years ago

Hi Jeremy, I'm glad to hear that you're interested in OverflowDB! As you probably noticed, the docs aren't up to scratch, and that's actually intentional - we don't consider it quite ready for 'mainstream' adoption yet. That being said, your help in gremlin-scala shows that you know what you're doing, and you'll be able to balance around some sharp edges, so that'll be ok.

Some historical context: we initially used gremlin-scala and tinkergraph. A while back we forked tinkergraph, mostly for a better memory footprint. And finally we stopped using gremlin-scala and the tinkerpop api for performance reasons. We only used a subset of it anyway, most importantly we had no need for remote traversals, and given my background in gremlin-scala it was quite straightforward to reimplement the core parts with a similar familiar api.

The new backend is again motivated by (further) improved performance and (further) reduced memory consumption - a very early prototype with limited functionality was very promising, so we decided to continue this path, and expect a more complete prototype by the end of this month. It's using memory backed files and a readonly api, i.e. all mutations will go via (DiffGraphs](https://github.com/ShiftLeftSecurity/overflowdb/blob/master/core/src/main/java/overflowdb/BatchedUpdate.java).

Can you describe your use case in a nutshell? Is it (like ours) mostly read-oriented? We typically create graphs by adding up a few large layers, and then only query them.

It's quite likely that the API will see some changes, so I wouldn't recommend to base your entire stack on it quite yet. You can however start with a prototype to test the waters, get to know the different parts (schema, codegen, core, traversal, ...) to see if it works for you and if you get similar benefits from it as we did. By the time you have all that, we'll have some of the open questions answered...

jeremysears commented 2 years ago

I presume our use cases are quite similar; we're also building static analysis systems. We're memory sensitive, but we're currently balanced between read/write. We also have several use-cases for graph layering, but it isn't a build-then-query model like yours.

We have some desire to remain abstracted from our graph backend, which is why we're trying to stay on TinkerPop. We've played w/ several persistent backends (DataStax, JanusGraph on Cassandra/ScyllaDB, and others), but ultimately we stuck with TinkerGraph. I stumbled on your TinkerGraph fork a while back, when I was considering going down the same route to eliminate its use of Maps/Sets, and we are still on that today. However, we've been stuck on an older version that predates the introduction of the overflow changes. We just freed up some cycles to plot a way forward, and we're hoping to make some contributions along the way. Here are some of the options we are considering.

1) Fork ShiftLeftSecurity/tinker-graph before the overflow changes, remove dependencies on Vertex/Edge super classes, and bring up to date with the latest TP. 2) Pull ShiftLeftSecurity/overflowdb before the TP removal. Wait until the storage layer stabilizes. If the new storage layer is appropriate for our use cases, and we think the API will stabilize, we could maintain the TP wrapper as overflowdb evolves. Otherwise, we could fork at that point. 3) Re-implement our gremlin-scala domain specific DSL on top of overflowdb's traversal layer, while rewriting all mutations using DiffGraphs.

We don't like the idea of forking, so we would like to avoid that if we can. The first two options would free us up to push our gremlin-scala enhancements upstream. Our mutations are generally fairly simple, so the third option may be viable, but our code expects read-after-write semantics, so that could get involved. Our gremlin-scala case-classes, vertices/edges, and DSL Steps are all generated, so some of that would also be mechanical. I agree that a prototype is in order. Thanks for getting back to me so quickly. Stay well, thank you for your help, and thank you and your team for all of your contributions!