crate / crate-clients-tools

Clients, tools, and integrations for CrateDB.
https://crate.io/docs/clients/
Apache License 2.0
2 stars 1 forks source link

Dremio: Unified Analytics Platform #136

Open amotl opened 1 month ago

amotl commented 1 month ago

About

OSS

Dremio - the missing link in modern data. Dremio enables organizations to unlock the value of their data.

Commercial

The Unified Lakehouse Platform for Self-Service Analytics and AI.

Dremio provides the fastest SQL engine with the best price-performance for Apache Iceberg, an Apache Iceberg catalog and Lakehouse Management service for next-gen dataops, and hybrid cloud deployment flexibility.

References

/cc @hlcianfagna, @karynzv, @hammerhead

amotl commented 1 month ago

Sharing a few spots that sparked my interest.

Search

Lucene Index

Remote Store

MongoDB, Elasticsearch, Dataplane

amotl commented 1 month ago

Dremio's documentation about its distributed storage subsystem also sparked my interest. Other than this, also referencing the PostgreSQL integration here.

amotl commented 1 month ago

Problem

When trying to build https://github.com/dremio/dremio-oss, this error is raised:

mvn clean install -DskipTests
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.12.1:compile (default-compile)
        on project errorprone-dremio: Compilation failure
[ERROR] [options] system modules path not set in conjunction with -source 11

Details

$ mvn --version
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /usr/local/Cellar/maven/3.8.6/libexec
Java version: 19.0.1, vendor: Homebrew, runtime: /usr/local/Cellar/openjdk/19.0.1/libexec/openjdk.jdk/Contents/Home
Default locale: en_GB, platform encoding: UTF-8
OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"

Pretty old, but also happens on Maven 3.9.8, according to @karynzv.

Version Spec

Wondering if my software versions would be too recent or too old, it is not the case.

<requireMavenVersion>
  <version>[3.3.9,4)</version>
</requireMavenVersion>
<requireJavaVersion>
  <version>[11,)</version>
</requireJavaVersion>
amotl commented 4 weeks ago

Problem

[ERROR] [options] system modules path not set in conjunction with -source 11

Solution

Adding -Derrorprone.skip makes the build progress further. At BUILD FAILURE with -Ddremio.oss-only, we also picked up two more build options.

mvn clean install -DskipTests -Derrorprone.skip -Ddremio.oss-only=true -Dlicense.skip=true -e
amotl commented 4 weeks ago

Adding -Derrorprone.skip makes the build progress further. At BUILD FAILURE with -Ddremio.oss-only, we also picked up two more build options.

mvn clean install -DskipTests -Derrorprone.skip -Ddremio.oss-only=true -Dlicense.skip=true -e
amotl commented 4 weeks ago

Problem

[ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.43.0:check (spotless-check) 
on project dremio-sabot-kernel: 
Unable to check file /Users/amo/dev/foss/dremio-oss/sabot/kernel/src/main/java/com/dremio/sabot/op/join/merge/MergeJoinComparatorTemplate.java: 
com.google.googlejavaformat.java.FormatterException: 215:10: error: invalid use of a restricted identifier 'yield'

Solution

Pending.

amotl commented 4 weeks ago

Next

@karynzv started to look into skipping the Dremio build, use its OCI image instead, trying to build and pluck the CrateDB connector into it. Thanks!

Problem

When trying to build the connector, it also fails. It is probably using an outdated API.

[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1073,96] cannot find symbol
  symbol:   method getNessieTreeApiBlockingStub()
  location: class com.dremio.exec.server.SabotContext
[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1073,42] cannot find symbol
  symbol:   method getNessieContentsApiBlockingStub()
  location: class com.dremio.exec.server.SabotContext
[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1079,55] incompatible types: org.apache.hadoop.conf.Configuration cannot be converted to com.dremio.exec.store.iceberg.SupportsIcebergMutablePlugin
karynzv commented 3 weeks ago

Let me document what I've tested when trying to connect with Dremio:

karynzv commented 3 weeks ago

I did some further tests and this seems to be the recommended approach to use Dremio with CrateDB.

Instead of using a specific connector, there is the option to query CrateDB directly from Dremio as documented here. So, by using the default Postgres connector as explained above, use the following syntax to query CrateDB directly:

SELECT * FROM table(crate.external_query('SELECT o[''it''] FROM doc.test_view;'))

Further details on the syntax and use here

amotl commented 3 weeks ago

Hi. Thanks a stack for your reports, both how to set up a development sandbox for the community connector, and for educating us that the external queries connector works well.

  1. Shall we document this fact by adding an item about Dremio to crate-clients-tools and cratedb-guide?
  2. What's next?

    These external queries, so called because they are passed by and run outside of Dremio.

    Is it still applicable to continue working on fixing the native connector for CrateDB, because this is one major detail what Dremio is about, running the queries inside Dremio's core engine, and not by-passing it, in order to combine multiple data sources by using its federation layer?