Open lmeyerov opened 3 years ago
cc @jsteemann as you seem to be the main contact for this :)
Helpful links:
https://github.com/graphistry/pygraphistry : where we'd like to prove out native Arango<>Graphistry support
https://rapids.ai/ + https://arrow.apache.org/ : apache arrow ecosystem, incl. gpu accelerated -- arrow/parquet/orc/pandas/spark/nvidia/etc.
@lmeyerov : Hi, thanks for getting in touch. Let me check who will be the contact on our side. It will not necessarily be me. Need to check it internally first. Will get back once I have more info!
Thanks @jsteemann !
If it helps, we're ultimately interested in a few integration points:
-- converting arango query responses into arrow-typed record or arrow-typed node+edge property tables, e.g., https://github.com/graphistry/pygraphistry/blob/master/demos/demos_databases_apis/arango/arango_tutorial.ipynb except with types -- dispatching 'search' queries (text, pattern, ...) -- dispatching 'pivot' / 'expand' queries (set of IDs , potentially a pattern expression -> result graph) -- schema fetch query, ideally also into a subgraph -- any other graph-y queries, such as all paths between 2 points
We'd love to help the Java-using arangodb team be successful now, and are gearing up for a public native arango connector in q1 :)
@lmeyerov did you ever complete your native Graphistry<->ArangoDB connector?
Hi @grepler we have arangodb<>graphistry users combining via pydata envs like jupyter notebooks & streamlit dashboards, via our respective JS APIs, and I'm unsure with our REST API
no-code/low-code (so no python/js/...) is a longer story. we're starting to do more customer-funded projects around roadmap items, so def something we're watching out for. if relevant, happy to chat!
Thanks @lmeyerov, I'll keep experimenting - bi-directional exploration & tagging interaction with the graph model would be amazing, but I will see if I can get by with one-way visualization of our AQL graph for the time being. We're still in early internal tool development on our end, ArangoDB has some unique functionality and we really like the AQL language for it's flexibility, but the third-party tooling ecosystem is still very early days it seems.
+1 for more ArangoDB tooling adoption! Will keep your offer in mind as we continue our testing.
Great, lmk. Likewise, on the visual side, feel free to shout in our community slack.
RE:bidirectional, a relevant feature request we've heard is exposing custom action buttons in our UI, so when embedding, you turn custom tag etc calls into an action like tagging a node in the DB . (Related, we're actively working on in-tool "grouping", such as for selecting nodes and saving as a tagged group, and "visual search", where analysts can build up pattern searches without writing cypher/aql/etc.)
The Graphistry team is starting to get requests from Arango db users to help grow their Arango implementations + use cases, and we're wondering if there is any guidance for getting Arango to interop with the broader Apache / Python / etc. data community? Ideally, via parquet/orc (cold) or even better, apache arrow (in-memory / streaming / etc.)?
Most immediately, we're working with one team where the goal is Arango<>their Java app<>Graphistry.
V0: The no-thought solution is doing
Arango--[json/csv]-->java--[json/csv]-->graphistry
, but that means big transfers, losing existing type data, etc. On the plus side, when the customer does know the result column schema, they can send that as part of the graphistry ingest step.V1: To do better, we're thinking
Arango---[velocypack]-->Java app--[manually constructed arrow or orc typed columnar format for node+edge property tables]-->Graphistry
. Though we're unsure what such a conversion looks like, e.g., any sample VelocyPack code, and especially wrt taking type/serialization wrangling pain away from Arango users by doing automated conversions.V2: Longer term, we're thinking direct
Arango--[velocypack stream]-->graphistry REST API--[velocypack stream chunk to arrow conversion]-->graphistry internal
. Or better,Arango--[apache arrow/parquet/orc]-->Graphistry
, if on the roadmap. In both cases, no type wrangling etc. for users.Any pointers would be appreciated. As simplifying constraints, users can get a lot of mileage by limiting the initial scope to node/edge queries that return primitively typed columns (string/int/date/etc.). Long-term, for fancier nested types (json, ...), Arrow etc. ecosystem do support an increasing variety.
Thanks!