NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
49 stars 36 forks source link

[BUG] prediction module loads the profiler output twice #1073

Closed amahussein closed 3 months ago

amahussein commented 3 months ago

Describe the bug

After adding a new folder raw_metrics/app_id under rapids_4_spark_qualification_output, it looks like there is something broken in

            # search profile sub directories for appIds
            app_ids = find_paths(
                prof, RegexPattern.app_id.match, return_directories=True
            )

Because this means that we return the directory twice. Once under Qualification and once under Profiler

Another problematic issue in the code is that the prediction module would not find any think to load despite there is output generated by the core tools.

    for dataset, input_df in processed_dfs.items():
        if not input_df.empty:
            # ...
        else:
             logger.warning('Nothing to predict for dataset %s', dataset)
2024-06-04 22:25:07,738 WARNING spark_rapids_tools.tools.model_xgboost: Nothing to predict for dataset qual_20240605032442_CD14eAde
parthosa commented 3 months ago

This has been resolved in https://github.com/NVIDIA/spark-rapids-tools/pull/1076

amahussein commented 3 months ago

This has been resolved in https://github.com/NVIDIA/spark-rapids-tools/pull/1076