RumbleDB / rumble

⛈️ RumbleDB 1.22.0 "Pyrenean oak" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
http://rumbledb.org/
Other
213 stars 82 forks source link

HTTP Parameters #1196

Open l0rn0r opened 2 years ago

l0rn0r commented 2 years ago

Hi, I'm running a RumbleDB docker as a server with the current docker image & it works with Jupyter Notebooks. Now I'm trying to run HTTP requests, and for example http://localhost:8001/jsoniq?query-path=/home/query.jq (/home is mounted), with json-file("/home/data.json") as query.jq works.

But when I try to use a HTTP parameter like query: http://localhost:8001/jsoniq?query='1+1' I get a Parser failed. [XPST0003] error response. https://github.com/RumbleDB/rumble/blob/master/docs/HTTPServer.md#testing-that-it-works-not-necessary-for-most-end-users says: Almost all parameters from the command line are exposed as HTTP parameters. And there is a query parameter: https://rumble.readthedocs.io/en/latest/CLI/

So how can I use the HTTP parameters?

The error-message when posting with the query parameter is:

{ "error-message" : "There was an error on line 1 in file:\/var\/spark\/:\n\n\n^\n\nCode: [XPST0003]
Message: Parser failed.
Metadata: file:\/var\/spark\/:LINE:1:COLUMN:0:
This code can also be looked up in the documentation and specifications for more information.
", "error-code" : "XPST0003", "stack-trace" : [
"org.rumbledb.compiler.VisitorHelpers.parseJSONiqMainModule(VisitorHelpers.java:146)",
"org.rumbledb.compiler.VisitorHelpers.parseMainModule(VisitorHelpers.java:114)",
"org.rumbledb.compiler.VisitorHelpers.parseMainModuleFromQuery(VisitorHelpers.java:99)",
"org.rumbledb.api.Rumble.runQuery(Rumble.java:44)",
"org.rumbledb.cli.JsoniqQueryExecutor.runInteractive(JsoniqQueryExecutor.java:221)",
"org.rumbledb.server.RumbleHttpHandler.handle(RumbleHttpHandler.java:115)",
"com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)",
"sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)",
"com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)",
"sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)",
"com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)",
"sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)",
"sun.net.httpserver.ServerImpl$DefaultExecutor.execute(ServerImpl.java:158)",
"sun.net.httpserver.ServerImpl$Dispatcher.handle(ServerImpl.java:431)",
"sun.net.httpserver.ServerImpl$Dispatcher.run(ServerImpl.java:396)", "java.lang.Thread.run(Thread.java:748)" ] }

The same with ?query="1+1" and ?query=1+1.

Thanks for any help. Best, Jonas

ghislainfourny commented 1 year ago

Thanks for your question! Apologies I did not see it earlier.

If my memory is correct, the query should be supplied in the HTTP Body of a POST request rather than in the query parameter. I will update the documentation accordingly to make it clear. I will also consider making the query parameter work as you expect in a future release. Thanks for the heads up!

Note that the HTTP server of RumbleDB is very simplistic and more here as a backend for the Jupyter Notebook (with the RumbleDB PiPy library). An even better way to interact with RumbleDB over a server is through Apache Livy, exposed by most cloud providers when you trigger a Spark cluster, and which is more robust for complex use cases because it really forwards everything to the command line RumbleDB.