[BUG] Scalable solution for output files location in the console output

parthosa commented 2 months ago

Currently, the console output shows the following lines to indicate location of important files:

    - Summarized savings and speedups CSV report: /output/qual_20240805225850_C3c0aA4E/qualification_summary.csv
    - Intermediate output generated by tools: /output/qual_20240805225850_C3c0aA4E/intermediate_output
    - Metadata file with cluster recommendation and tuning details: /output/qual_20240805225850_C3c0aA4E/app_metadata.json
    - Application status report: /output/qual_20240805225850_C3c0aA4E/rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_status.csv

Comments

The above information gets lost because we display the TCV table after this.
Since we want users to reference these files, we should probably display these on console after the TCV table.
Additionally, we could reduce the number of lines:
- We should have only lines (summary csv and metadata json)
- Incase there are any failures in processing event logs (auth error, gpu event log), then show the status csv line.

amahussein commented 2 months ago

Thanks @parthosa ! The name is very generic. Can we change the issue title to be more specific on what we are trying to do here?

That's tricky. It has some sort of personal styling and preferences.

The PRD originally aimed to keep the "table" (or at least partially) in sight without need to scroll to see the qualified app.
We have the footnotes which have to show immediately below the table
The notes section is subject of increasing. Upcoming new features could add more comments to that section.
The same for the output files, they can vary and increase.

parthosa commented 2 months ago

Thanks @amahussein. Yes I agree we need a scalable way to address the problem of output files and their display in the console. List of important files might keep getting longer making console output more cluttered.

Based on this and other offline discussions, we could have a results_metadata.json that contains both outputFiles and appResults entry? (attached a sample)

File: `results_metadata.json`

``` { "outputFiles": [ { "fileName": "qualification_summary.csv", "description": "Summary of the qualification tool run.", "path": "/path/qual_20240805225850_C3c0aA4E/qualification_summary.csv" }, { "fileName": "rapids_4_spark_qualification_output_status.csv", "description": "Status of applcations that were processed by the qualification tool.", "path": "/path/qual_20240805225850_C3c0aA4E/rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_status.csv" } ], "appResults": [ { "appId": "app-20240311074805-0000", "appName": "test_app_xxxxx", "eventLog": "file:/path/to/log", "clusterInfo": { "platform": "dataproc", "sourceCluster": { "driverNodeType": "n1-standard-16", "workerNodeType": "n1-standard-8", "numWorkerNodes": 9 }, "recommendedCluster": { "driverNodeType": "n1-standard-16", "workerNodeType": "n1-standard-32", "numWorkerNodes": 9, "gpuInfo": { "device": "nvidia-tesla-t4", "gpuPerWorker": 4 }, "ssdInfo": { "numLocalSsds": 2 } } }, "estimatedGpuSpeedupCategory": "Medium", "fullClusterConfigRecommendations": "/tools-run/qual_20240805222947_F2b32E83/rapids_4_spark_qualification_output/tuning/app-20240311074805-0000.conf", "gpuConfigRecommendationBreakdown": "/tools-run/qual_20240805222947_F2b32E83/rapids_4_spark_qualification_output/tuning/app-20240311074805-0000.log" },{ "appId": "app-20240311074805-0000", "appName": "test_app_xxxxx", "eventLog": "file:/path/to/log", "clusterInfo": { "platform": "dataproc", "sourceCluster": { "driverNodeType": "n1-standard-16", "workerNodeType": "n1-standard-8", "numWorkerNodes": 9 }, "recommendedCluster": { "driverNodeType": "n1-standard-16", "workerNodeType": "n1-standard-32", "numWorkerNodes": 9, "gpuInfo": { "device": "nvidia-tesla-t4", "gpuPerWorker": 4 }, "ssdInfo": { "numLocalSsds": 2 } } }, "estimatedGpuSpeedupCategory": "Medium", "fullClusterConfigRecommendations": "/tools-run/qual_20240805222947_F2b32E83/rapids_4_spark_qualification_output/tuning/app-20240311074805-0000.conf", "gpuConfigRecommendationBreakdown": "/tools-run/qual_20240805222947_F2b32E83/rapids_4_spark_qualification_output/tuning/app-20240311074805-0000.log" } ] } ```

Now in the console we would have only two lines:

    - Summarized speedups CSV report: /output/qual_20240805225850_C3c0aA4E/qualification_summary.csv
    - Additional information about output files and qualified apps: /output/qual_20240805225850_C3c0aA4E/results_metadata.json`

NVIDIA / spark-rapids-tools

[BUG] Scalable solution for output files location in the console output #1264

Comments