eto-ai / rikai

Parquet-based ML data format optimized for working with unstructured data
https://rikai.readthedocs.io/en/latest/
Apache License 2.0
137 stars 19 forks source link

Support Databricks Runtime 9.1 LTS #522

Closed da-liii closed 2 years ago

da-liii commented 2 years ago

https://docs.databricks.com/release-notes/runtime/9.1.html

da-liii commented 2 years ago

Here is spark-rapids strategy to manage different Spark versions:

https://github.com/NVIDIA/spark-rapids/blob/e2cfb718b6d86da9ec9a15ba8b26db94b9fe6f3b/pom.xml#L833-L844

da-liii commented 2 years ago

It seems that NVIDIA is using the DBR runtime jars for building spark-rapids: https://github.com/NVIDIA/spark-rapids/blob/branch-22.04/jenkins/databricks/build.sh

DBR 9.1 LTS is a mix of Apache Spark 3.1.2 and Apache Spark 3.2.0. We must hack the compile deps like NVIDIA to make DBR 9.1 LTS work.

da-liii commented 2 years ago

And DBR 10.4 LTS (Apache Spark 3.2.1) is released. I think we should skip DBR 9.1 LTS.