RumbleDB / rumble

⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
http://rumbledb.org/
Other
211 stars 82 forks source link

Question: JSONiq to SQL #1259

Open jsommr opened 2 weeks ago

jsommr commented 2 weeks ago

I keep coming back to JSONiq because it's such an elegant language, and it would be cool if I could use it for an api instead of eg. GraphQL. There's Hasura and similar for the latter that auto generates a GraphQL api for a database and uses the GraphQL AST to generate a single SQL query for a database (simply put).

Would it make sense (and be possible) for RumbleDB to act as an api, turning a JSONiq POST request into SQL, query a database (efficiently, with as much sql packed into one query as possible) and return the result? I don't know enough about Spark, but ClickHouse has a Spark connector. Would it be possible to connect RumbleDB with that (or similar for other db's) and get pushed down requests (including aggregates like avg, sum etc.)?

ghislainfourny commented 1 day ago

Thank you for your feedback @jsommr! Apologies as I only saw it now.

Generally, it does make a lot of sense to allow RumbleDB to connect to other backends. If that backend has a Spark connector, then it should be even relatively easy. I tried for example with MongoDB (which also has a Spark connector) and it was only an afternoon of work to connect it.

Regarding pushdowns: absolutely, this is even desirable. Typically, one builds a first prototype that just generally connects to the backend, and then one keeps adding pushdowns and optimizations to make it faster.

Thank you for the nice words on JSONiq!