Closed pjfanning closed 5 years ago
Hello and many thanks for your interest in Rumble.
I think it should be doable in a simple way by using the same internal JSONiq execution API used by the CLI and the Shell. It may be a nice opportunity to make this API "official" for all who want to invoke queries in other ways than the Shell (including HTTP). Also, this is something that would be nice to have as a public "try-it-out" page.
Before we dig more, could you elaborate on these two points? The simplest HTTP server I can imagine receives a JSONiq query (POST) and reads data from no input, and outputs something small enough to be sent as an HTTP response. A more elaborate HTTP server would have an underlying cluster.
Thanks!
Thanks @ghislainfourny for your detailed response. I agree that there a quite a few different scenarios that could be supported. In my use case, I would be dealing with reasonably sized data sets so would like to return the data in the HTTP response. The data to process would already be stored in HDFS or maybe in AWS S3.
It makes sense.
I would recommend giving it a first try by reusing JsoniqQueryExecutor.run() like so, after saving the query received in the HTTP POST request under the path querypath on HDFS, and picking some location outputpath on HDFS:
String querypath = "hdfs://host:port/user/hadoop/query.jq"; // JSONiq query should have been copied to this location
String outputpath = "hdfs://host:port/user/hadoop/output"; // make sure it does not already exist!
SparksoniqRuntimeConfiguration sparksoniqConf = new SparksoniqRuntimeConfiguration(new String[] { "--result-size", "1000" }); // simulate CLI parameters, you can also set higher to allow more objects in the output (but it is not recommended setting too high to avoid a crash)
JsoniqQueryExecutor rumbleEngine = new JsoniqQueryExecutor(false, sparksoniqConf);
rumbleEngine.run(querypath, outputpath);
Then, you can read from outputpath, concatenate the files and output them as an HTTP response.
This should allow you to build an HTTP server prototype, with the rumble JAR on the classpath and with just the above few lines to invoke the JSONiq query and write its output to HDFS.
Note that you may need to embed the HTTP server inside a jar, and call this jar with spark-submit to make sure everything is executed with a Spark environment. The main function in the HTTP server jar can create the HTTP server and start listening on port 8080 or alike, then invoke the above code.
When this works, we could extend the API to provide a more efficient function (for example, that collects the output directly to memory so you don't have to read back the output from HDFS).
Another possibility, completely different (but more encapsulated), is to have the HTTP server call Rumble via the CLI, invoking spark-submit from within Java with the rumble jar and --query-path and --output-path appropriately set. But it would be slower because the executors must be allocated and deallocated every time.
@ghislainfourny it would suit my use case best if I could pass the queryText without sticking it in a file and then use JsoniqQueryExecutor run or runInteractive to get the result. Would it be feasible to extend JsoniqQueryExecutor to have a method that took the queryText as a parameter?
Absolutely. There is actually a public Java API on the way that will address this and give a simple, high-level way to execute a query and go through its results. The goal is that it can be used via a simple maven import once it is registered on maven central.
@pjfanning we now have an official Maven repository:
https://search.maven.org/search?q=g:com.github.rumbledb
and the public API is here:
http://rumbledb.org/site/apidocs/org/rumbledb/api/package-summary.html
@ghislainfourny thanks, I'll try that out
I'd be interested in trying to have a JVM which accepted HTTP requests that included JSONiq queries and that used rumbledb code to run the queries. The rumbledb samples use spark-submit to run queries from the command-line or from a shell. I would like to leave the HTTP server running over extended periods and create a UI that interacted with it.
I would appreciate if someone could give me some advice on where to start.