NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
49 stars 36 forks source link

[BUG] python user tools should always display processed apps - even if passed GPU event logs #1126

Closed tgravescs closed 2 months ago

tgravescs commented 3 months ago

Describe the bug I accidentally ran the python user tools qualification against a GPU event log. The python side just reported:

Qualification tool did not generate any valid rows

This is not very user friendly. We should always report the number we processed and then if it failed or had issues report why.

It looks like rapids_4_spark_qualification_output_status.csv in the qualification java tool output has the information needed. In this case the event log was GPU based so it didn't generate any valid rows in the rapids_4_spark_qualification_output.csv

amahussein commented 2 months ago

Bumped the priority to P0. Felix requested that this gets fixed in 24.08

Related issue: #1164 [BUG] Profiling Tool does not contain status info for failed event log

mattahrens commented 2 months ago

@parthosa will this also include reporting status for skipped logs (such as for Databricks Photon)?

amahussein commented 2 months ago

@parthosa will this also include reporting status for skipped logs (such as for Databricks Photon)?

yes, it displays the content of the CSV file (rapids_4_spark_qualification_output_status.csv) showing the status of processing each eventlog. This includes:

mattahrens commented 2 months ago

To confirm -- will all that info show up in the CLI output in addition to the status CSV file?

amahussein commented 2 months ago

yes, it will show up in the console STDOUT as part of the final report. Later we will improve it to contain heuristics decision with a brief summary about why a specific app was excluded (this will be stdout console + writing it to a CSV file in the root folder because it combines decision from python + scala).

parthosa commented 2 months ago

I discussed the problem statement with Felix. There are two problems that need to be addressed:

  1. When there is "no output" or when there is an error cannot read file, cannot open dbfs, nothing is shown in the console.
  2. User has to open the log file to view what happened. This is not very convenient in notebook environment

Problem 1

When there is "no output" or when there is an error cannot read file, cannot open dbfs, nothing is shown in the console.

Solution

For example, in case all event logs are failed/skipped/file not found, console output will be as follows:

- Application status report: /tools-run/qual_20240718000814_Bf322FFd/rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_status.csv
Qualification tool found no records to show.

Report Summary:
----------------------  -
Total applications      3
Processed applications  0
Top candidates          0
----------------------  -

Problem 2

User has to open the log file to view what happened. This is not very convenient in notebook environment

Solution