Open GavinRay97 opened 2 years ago
There was some discussion in the mailing list in Novemever that might help you, maybe you can collaborate with @rymarm (who sent the emails) on this...
EDIT: Oh dear, it looks like Pony Mail isn't very good at URLs. But try searching for "Start embedded Drill on JDBC connection" with a date range of "the last year" here:
Great idea. I wrote a lot of that code originally, let me know if you have questions. The dependencies might be related to things like the test-only row set classes, integrations with JUnit for temporary directories and so on. It may be possible to split the class so that one class has only those bits and pieces needed for client apps. and a subclass adds the additional parts used by tests.
@GavinRay97 several months before I have found that Drill can be run in "embedded mode" with a pretty simple configuration. To achieve this, you need to add the next dependencies to your project pom.xml
:
<dependencies>
<dependency>
<groupId>org.apache.drill.exec</groupId>
<artifactId>drill-java-exec</artifactId>
<version>1.19.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.drill.exec</groupId>
<artifactId>drill-jdbc</artifactId>
<version>1.19.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>21.0</version>
</dependency>
</dependencies>
And after that, you will be able to run embedded Drill with the following example code:
// This part is responsible for running the embedded Drill and establishing a connection to it.
// "jdbc:drill:zk=local" is connection string to run embedded Drill
Connection connection = DriverManager.getConnection("jdbc:drill:zk=local");
// Example of executing simple query
Statement st = connection.createStatement();
// `/home/maksym/Desktop/sample.csv` is path to csv file that I've created for the example
ResultSet rs = st.executeQuery("select * from dfs.`/home/maksym/Desktop/sample.csv`");
while (rs.next()) {
System.out.println(rs.getString(1));
}
connection.close();
I didn't find exhausting information on how exactly should be configured application: what dependencies are required, what properties are available, and so on. But you can dive into code and look at how embedded mode was implemented. Here is the departure point: https://github.com/apache/drill/blob/15b2f52260e4f0026f2dfafa23c5d32e0fb66502/exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillConnectionImpl.java#L104
Besides this, you also find many Jira tickets that belong to issues with embedded Drill, here are several of them: DRILL-2126, DRILL-1654, DRILL-1409
According to my investigation of code and manual tests, it seems, that embedded Drill works pretty well and the only issue is dependency conflicts, that is why in my example above, I added guava and excluded log jars.
I would like to gather as much information as possible about embedded Drill and add it to Drill documentation or make some code improvements to let users freely use this mode for their application. Of course, Drill was created as a distributed system, but Drill is so powerful tool that is also very useful even in single, embedded node mode.
Thanks @rymarm for the info! This is one of those cases where bug becomes a feature. The reason embedded Drill works via JDBC is that most of Drill ends up getting sucked into the JDBC driver for no good reason other than that the RPC code depends on everything else. That's lucky for you, but not so great for folks who just want a simple JDBC driver.
As it turns out, the reason that SqlLine can run an embedded Drill is because the JDBC driver contains all the code. But, do we want a JDBC driver to include a Spunk connector, a PDF reader, support for Hadoop and all the rest? Kind of creates a rather fat client, and all those libraries conflict with that the surrounding app wants to do. This is why the JDBC driver build chucks a bunch of dependencies overboard.
At some point (maybe Drill 2.0?) we need to create a simpler JDBC driver. At that point, the mechanism that @GavinRay97 original requested will be needed to start the server that the JDBC driver then connects to. We're not there now (far from it), but that's kind of where we should head. (There is a whole vector discussion that includes this topic.)
Hello, I would like to embed Drill in a JVM application, running as a single node in-memory. I will feed it Calcite
RelNode
relational expressions to execute that my application is generating.Browsing the code to try to find out how best to go about this, I found in
ClusterFixtureBuilder.java
:(If this isn't the best/easiest way to embed a single Drill node please let me know and I will delete this issue 😅)
https://github.com/apache/drill/blob/2decae18b85eeda51816e92d5a9e9e6e2f9ce8d5/exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixtureBuilder.java#L29-L43
https://github.com/apache/drill/blob/2decae18b85eeda51816e92d5a9e9e6e2f9ce8d5/exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixtureBuilder.java#L279-L301
But it looks like there is no Maven artifact or
.jar
to download to include this functionality as an end user =/I tried to copy-paste the primary classes, but there is a spiderweb of dependencies through out the
org.apache.drill.test
andorg.apache.drill.exec.testing
packages.