Open amotl opened 4 months ago
Sharing a few spots that sparked my interest.
Dremio's documentation about its distributed storage subsystem also sparked my interest. Other than this, also referencing the PostgreSQL integration here.
When trying to build https://github.com/dremio/dremio-oss, this error is raised:
mvn clean install -DskipTests
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.12.1:compile (default-compile)
on project errorprone-dremio: Compilation failure
[ERROR] [options] system modules path not set in conjunction with -source 11
$ mvn --version
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /usr/local/Cellar/maven/3.8.6/libexec
Java version: 19.0.1, vendor: Homebrew, runtime: /usr/local/Cellar/openjdk/19.0.1/libexec/openjdk.jdk/Contents/Home
Default locale: en_GB, platform encoding: UTF-8
OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"
Pretty old, but also happens on Maven 3.9.8, according to @karynzv.
Wondering if my software versions would be too recent or too old, it is not the case.
<requireMavenVersion>
<version>[3.3.9,4)</version>
</requireMavenVersion>
<requireJavaVersion>
<version>[11,)</version>
</requireJavaVersion>
[ERROR] [options] system modules path not set in conjunction with -source 11
Adding -Derrorprone.skip
makes the build progress further. At BUILD FAILURE with -Ddremio.oss-only, we also picked up two more build options.
mvn clean install -DskipTests -Derrorprone.skip -Ddremio.oss-only=true -Dlicense.skip=true -e
Adding -Derrorprone.skip
makes the build progress further. At BUILD FAILURE with -Ddremio.oss-only, we also picked up two more build options.
mvn clean install -DskipTests -Derrorprone.skip -Ddremio.oss-only=true -Dlicense.skip=true -e
[ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.43.0:check (spotless-check)
on project dremio-sabot-kernel:
Unable to check file /Users/amo/dev/foss/dremio-oss/sabot/kernel/src/main/java/com/dremio/sabot/op/join/merge/MergeJoinComparatorTemplate.java:
com.google.googlejavaformat.java.FormatterException: 215:10: error: invalid use of a restricted identifier 'yield'
Pending.
@karynzv started to look into skipping the Dremio build, use its OCI image instead, trying to build and pluck the CrateDB connector into it. Thanks!
When trying to build the connector, it also fails. It is probably using an outdated API.
[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1073,96] cannot find symbol
symbol: method getNessieTreeApiBlockingStub()
location: class com.dremio.exec.server.SabotContext
[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1073,42] cannot find symbol
symbol: method getNessieContentsApiBlockingStub()
location: class com.dremio.exec.server.SabotContext
[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1079,55] incompatible types: org.apache.hadoop.conf.Configuration cannot be converted to com.dremio.exec.store.iceberg.SupportsIcebergMutablePlugin
Let me document what I've tested when trying to connect with Dremio:
Community connector (https://github.com/rongfengliang/cratedb-dremio-connector)
1 - Run a Dremio docker as documented here (https://docs.dremio.com/current/get-started/docker-quickstart/)
2 - Build the community connector but removed the problematic test files in src/test/java/com/dremio/
, I personally haven't tried to fix it (https://github.com/rongfengliang/cratedb-dremio-connector)
3 - Move the resulting .jar
file to the docker jars/3rdparty
folder and restart docker as described here
4 - Add a new source now choosing the CRATEDB
option and configure with your cluster info access
5 - In Dremio, query a VIEW
as SELECT * FROM VIEW_NAME
, which will give an error
6 - Check in CrateDB for the queries run by Dremio with SELECT * from sys.jobs_log WHERE username = <DREMIO_USER>
The default Postgres connector:
1 - Run a Dremio docker as documented here (https://docs.dremio.com/current/get-started/docker-quickstart/)
2 - Add a new source choosing Postgres and configure accordingly.
3 - Try querying the data using the connector reference, due to the use of COLLATE
the queries will fail.
4 - Instead, use the approach described here and you should be able to query CrateDB directly
I did some further tests and this seems to be the recommended approach to use Dremio with CrateDB.
Instead of using a specific connector, there is the option to query CrateDB directly from Dremio as documented here. So, by using the default Postgres connector as explained above, use the following syntax to query CrateDB directly:
SELECT * FROM table(crate.external_query('SELECT o[''it''] FROM doc.test_view;'))
Further details on the syntax and use here
Hi. Thanks a stack for your reports, both how to set up a development sandbox for the community connector, and for educating us that the external queries connector works well.
What's next?
These external queries, so called because they are passed by and run outside of Dremio.
Is it still applicable to continue working on fixing the native connector for CrateDB, because this is one major detail what Dremio is about, running the queries inside Dremio's core engine, and not by-passing it, in order to combine multiple data sources by using its federation layer?
About
OSS
Commercial
References
/cc @hlcianfagna, @karynzv, @hammerhead