NVIDIA / spark-rapids-examples

A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.
Apache License 2.0
125 stars 51 forks source link

Main 2406 release #397

Closed nvliyuan closed 4 months ago

nvliyuan commented 4 months ago

update main branch for v2406 release. Please create merge commit not squash.

jlowe commented 4 months ago

This is missing #388.

nvliyuan commented 4 months ago

This is missing #388.

Hi @jlowe , do you mean this commit id?

jlowe commented 4 months ago

Yes, I mean that commit. It's weird in that I can see the commit in the commits to be merged, yet when I diff this PR against branch-24.06 there are some missing changes that I thought were coming from #388. For example:

$ git fetch https://github.com/nvliyuan/spark-rapids-examples.git main-2406-release
$ git diff branch-24.06 FETCH_HEAD
diff --git a/docs/get-started/xgboost-examples/csp/databricks/databricks.md b/docs/get-started/xgboost-examples/csp/databricks/databricks.md
index 1377a15..2d03013 100644
--- a/docs/get-started/xgboost-examples/csp/databricks/databricks.md
+++ b/docs/get-started/xgboost-examples/csp/databricks/databricks.md
@@ -21,8 +21,7 @@ Navigate to your home directory in the UI and select **Create** > **File** from
 create an `init.sh` scripts with contents:   
    ```bash
    #!/bin/bash
-   sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0.jar
-   ```
+   sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.04.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.04.0/rapids-4-spark_2.12-24.04.0.jar
 1. Select the Databricks Runtime Version from one of the supported runtimes specified in the
    Prerequisites section.
 2. Choose the number of workers that matches the number of GPUs you want to use.
[...]

We can see from the diff above that we're rolling back changes in branch-24.06 as part of this merge of that branch to main. Not quite sure why these changes were lost despite #388 being part of the commit list in this PR (bad merge conflict resolution maybe?), but it needs to be fixed. You can do a diff like I show above to verify all of the changes between this PR and branch-24.06 make sense.

nvliyuan commented 4 months ago

Hi @jlowe, thanks for pointing it put, I just checked the diff details, basically there are three part: part1: I miss some commits while doing the merge conflict, for example:

diff --git a/docs/get-started/xgboost-examples/csp/databricks/databricks.md b/docs/get-started/xgboost-examples/csp/databricks/databricks.md
index 1377a15..2d03013 100644
--- a/docs/get-started/xgboost-examples/csp/databricks/databricks.md
+++ b/docs/get-started/xgboost-examples/csp/databricks/databricks.md
@@ -21,8 +21,7 @@ Navigate to your home directory in the UI and select **Create** > **File** from
 create an `init.sh` scripts with contents:   
    ```bash
    #!/bin/bash
-   sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0.jar
-   ```
+   sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.04.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.04.0/rapids-4-spark_2.12-24.04.0.jar
 1. Select the Databricks Runtime Version from one of the supported runtimes specified in the

part2: There are something wrong in branch-2406, since PCA demo only compatible with spark-rapids v23.04, cuspecial demo only compatible with spark-rapids v23.02, we miss this during the previous release, for example:

diff --git a/examples/ML+DL-Examples/Spark-cuML/pca/README.md b/examples/ML+DL-Examples/Spark-cuML/pca/README.md
index 1b2b22d..0c4a514 100644
--- a/examples/ML+DL-Examples/Spark-cuML/pca/README.md
+++ b/examples/ML+DL-Examples/Spark-cuML/pca/README.md
@@ -12,7 +12,7 @@ User can also download the release jar from Maven central:

 [rapids-4-spark-ml_2.12-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0-cuda11.jar)

-[rapids-4-spark_2.12-24.06.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0.jar)
+[rapids-4-spark_2.12-23.04.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.04.0/rapids-4-spark_2.12-23.04.0.jar)

 Note: This demo could only work with v22.02.0 spark-ml version, and only compatible with spark-rapids versions prior to 24.06.0 . Please do not update the version in release.

@@ -49,7 +49,7 @@ It is assumed that a Standalone Spark cluster has been set up, the `SPARK_MASTER

     ``` bash
     RAPIDS_ML_JAR=PATH_TO_rapids-4-spark-ml_2.12-22.02.0-cuda11.jar
-    PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-24.06.0.jar
+    PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-23.04.0.jar

part3: it is ok to exist some snapshot version diff since we will remove all -snapshot in the main branch, for example:

diff --git a/examples/XGBoost-Examples/taxi/pom.xml b/examples/XGBoost-Examples/taxi/pom.xml
index 7dcfa92..dad8dfa 100644
--- a/examples/XGBoost-Examples/taxi/pom.xml
+++ b/examples/XGBoost-Examples/taxi/pom.xml
@@ -21,7 +21,7 @@
     <parent>
         <artifactId>sample_xgboost_examples</artifactId>
         <groupId>com.nvidia</groupId>
-        <version>0.2.3-SNAPSHOT</version>
+        <version>0.2.3</version>
     </parent>
     <modelVersion>4.0.0</modelVersion>

diff --git a/examples/XGBoost-Examples/utility/pom.xml b/examples/XGBoost-Examples/utility/pom.xml
index e84d9e2..15d5a97 100644
--- a/examples/XGBoost-Examples/utility/pom.xml
+++ b/examples/XGBoost-Examples/utility/pom.xml
@@ -21,7 +21,7 @@
     <parent>
         <artifactId>sample_xgboost_examples</artifactId>
         <groupId>com.nvidia</groupId>
-        <version>0.2.3-SNAPSHOT</version>
+        <version>0.2.3</version>
     </parent>
     <modelVersion>4.0.0</modelVersion>

For part1, I already push some commits to fix that For part2, I already draft pr #398 to fix the version issue in branch-2406 For part3, I assume we could ignore them. Now if you do a diff to compare main-2406-release and branch-2406, there should be only snapshot version diffs.

jlowe commented 4 months ago

Yes, snapshot diffs are expected. The other two need to be fixed.

Nit: Given there were a number of merge mistakes on this PR, it would be cleaner to wait for #398 to be fixed and then re-merge branch-24.06 to main in a fresh PR. We're not squashing commits for this, and all these "fix" commits will appear in the mainline release. The mainline history ideally should have a minimal number of divergent commits from the dev branches. Not must-fix for me, just would be nice.

nvliyuan commented 4 months ago

close this pr and draft new pr #401