NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
53 stars 37 forks source link

[FEA] support java api for qualx tool qualification process #1356

Closed nvliyuan closed 1 month ago

nvliyuan commented 1 month ago

The java api qualification output files cannot be used as the qualx tool prediction input since missing some files(features.csv...), it would be nice to keep the java API and python API in sync.

java API:
TOOL_JAR=/Users/yuali/Documents/sparks/qualification_tool/rapids-4-spark-tools_2.12-24.08.2.jar
java -cp $TOOL_JAR:$SPARK_HOME/jars/* \
    com.nvidia.spark.rapids.tool.qualification.QualificationMain -p $LOG

output: Image

python API:
spark_rapids qualification --platform onprem --eventlogs file:/xxx07045842-0021.zstd.inprogress

output: Image

amahussein commented 1 month ago

Thanks @nvliyuan ! The java API has nothing to do with the XGBoost prediction. The java API only scana and analyzes the eventlog to generate raw_metrics and stage information.

The python API spark_rapids qualification internally does the following:

  1. calls the qualification jar cmd.
  2. The jar cmd generates the directory rapids_4_spark_qualification_output
  3. The python reads the content of the the directory rapids_4_spark_qualification_output
  4. The python runs prediction based on the tables loaded from table 3 which generate the xgboost_predictions directory.

Can you elaborate more on what the problem is? and what is not being in sync here?

nvliyuan commented 1 month ago

Hi @amahussein, thanks for the quick reply, please ignore this FEA, I assume I already found the root cause of the issue, the customer write a udf to run qualx tool, but it hangs while running spark_rapids qualification process, it is because they should update https://github.com/NVIDIA/spark-rapids-tools/blob/14a4213d54d3035b974e6598a8418c01090755c0/user_tools/src/spark_rapids_pytools/resources/onprem-configs.json#L4 to their customized repo uri...