[QUESTION/ENH] Java / VelocyPack <> Apache Arrow / Graphistry

lmeyerov commented 3 years ago

The Graphistry team is starting to get requests from Arango db users to help grow their Arango implementations + use cases, and we're wondering if there is any guidance for getting Arango to interop with the broader Apache / Python / etc. data community? Ideally, via parquet/orc (cold) or even better, apache arrow (in-memory / streaming / etc.)?

Most immediately, we're working with one team where the goal is Arango<>their Java app<>Graphistry.

V0: The no-thought solution is doing Arango--[json/csv]-->java--[json/csv]-->graphistry, but that means big transfers, losing existing type data, etc. On the plus side, when the customer does know the result column schema, they can send that as part of the graphistry ingest step.
V1: To do better, we're thinking Arango---[velocypack]-->Java app--[manually constructed arrow or orc typed columnar format for node+edge property tables]-->Graphistry. Though we're unsure what such a conversion looks like, e.g., any sample VelocyPack code, and especially wrt taking type/serialization wrangling pain away from Arango users by doing automated conversions.
V2: Longer term, we're thinking direct Arango--[velocypack stream]-->graphistry REST API--[velocypack stream chunk to arrow conversion]-->graphistry internal. Or better, Arango--[apache arrow/parquet/orc]-->Graphistry, if on the roadmap. In both cases, no type wrangling etc. for users.

Any pointers would be appreciated. As simplifying constraints, users can get a lot of mileage by limiting the initial scope to node/edge queries that return primitively typed columns (string/int/date/etc.). Long-term, for fancier nested types (json, ...), Arrow etc. ecosystem do support an increasing variety.

Thanks!

lmeyerov commented 3 years ago

cc @jsteemann as you seem to be the main contact for this :)

Helpful links:

https://github.com/graphistry/pygraphistry : where we'd like to prove out native Arango<>Graphistry support
https://rapids.ai/ + https://arrow.apache.org/ : apache arrow ecosystem, incl. gpu accelerated -- arrow/parquet/orc/pandas/spark/nvidia/etc.

jsteemann commented 3 years ago

@lmeyerov : Hi, thanks for getting in touch. Let me check who will be the contact on our side. It will not necessarily be me. Need to check it internally first. Will get back once I have more info!

lmeyerov commented 3 years ago

Thanks @jsteemann !

If it helps, we're ultimately interested in a few integration points:

-- converting arango query responses into arrow-typed record or arrow-typed node+edge property tables, e.g., https://github.com/graphistry/pygraphistry/blob/master/demos/demos_databases_apis/arango/arango_tutorial.ipynb except with types -- dispatching 'search' queries (text, pattern, ...) -- dispatching 'pivot' / 'expand' queries (set of IDs , potentially a pattern expression -> result graph) -- schema fetch query, ideally also into a subgraph -- any other graph-y queries, such as all paths between 2 points

We'd love to help the Java-using arangodb team be successful now, and are gearing up for a public native arango connector in q1 :)

grepler commented 2 years ago

@lmeyerov did you ever complete your native Graphistry<->ArangoDB connector?

lmeyerov commented 2 years ago

Hi @grepler we have arangodb<>graphistry users combining via pydata envs like jupyter notebooks & streamlit dashboards, via our respective JS APIs, and I'm unsure with our REST API

no-code/low-code (so no python/js/...) is a longer story. we're starting to do more customer-funded projects around roadmap items, so def something we're watching out for. if relevant, happy to chat!

grepler commented 2 years ago

Thanks @lmeyerov, I'll keep experimenting - bi-directional exploration & tagging interaction with the graph model would be amazing, but I will see if I can get by with one-way visualization of our AQL graph for the time being. We're still in early internal tool development on our end, ArangoDB has some unique functionality and we really like the AQL language for it's flexibility, but the third-party tooling ecosystem is still very early days it seems.

+1 for more ArangoDB tooling adoption! Will keep your offer in mind as we continue our testing.

lmeyerov commented 2 years ago

Great, lmk. Likewise, on the visual side, feel free to shout in our community slack.

RE:bidirectional, a relevant feature request we've heard is exposing custom action buttons in our UI, so when embedding, you turn custom tag etc calls into an action like tagging a node in the DB . (Related, we're actively working on in-tool "grouping", such as for selecting nodes and saving as a tagged group, and "visual search", where analysts can build up pattern searches without writing cypher/aql/etc.)

arangodb / velocypack

[QUESTION/ENH] Java / VelocyPack <> Apache Arrow / Graphistry #77