Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.73k stars 2.92k forks source link

Alluxio-Trino for Trino version 434 and higher #18612

Open jpohanka opened 1 month ago

jpohanka commented 1 month ago

Page https://docs.alluxio.io/os/user/stable/en/compute/Trino.html

Alluxio version 2.9.0 and higher.

Trino version 434 and higher.

Summary The current documentation on Alluxio+Trino works for Trino all versions up to 433. When using higher versions, Trino throws a No FileSystem for scheme: alluxio error. This is due to some code changes for HDFS that were implemented in Trino 434.

To solve this problem, the alluxio-<version>-client.jar file needs to be copied to the path ${Trino_HOME}/plugin/hive/hdfs/ instead of the original path ${Trino_HOME}/plugin/hive-hadoop2/.

realknorke commented 1 month ago

@jpohanka we tested your workaround. However, your workaround doesn't seem to be enough/complete. My observations on that matter:

  1. There is no alluxio-$version-client.jar on Maven Central. We added alluxio-shaded-client-2.9.3.jar to the …/plugin/hive/hdfs folder.
  2. Beyond that we had to COPY (Dockerfile) the content of the /opt/alluxio/lib folder (from an Alluxio installation/image) into the Trino image under /opt/alluxio/lib.
  3. Adding version 312 instead of 2.9.3 is not working! I'm still confused about Alluxio's versioning. It comes with semantic versioning (e.g. 2.9.4) and an incompatible seq numbering (e.g. 312).
  4. There is a alluxio-client.jar in our Alluxio image. However, that jar file is a symlink to build/alluxio-2.9.3-hadoop2-client.jar. We are using Hadoop3 libs.