NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

[FEA] support task limit profiling for specified stages #11666

Closed thirtiseven closed 2 days ago

thirtiseven commented 3 weeks ago

Is your feature request related to a problem? Please describe.

Similar to https://github.com/NVIDIA/spark-rapids/issues/11082.

For now, spark-rapids only support to profile a whole stage based on the specified stage id, refer to the setting spark.rapids.profile.stages described here. While for production queries with super large input data size, a single single stage could last for tens minutes or even hours, it could be quite time-consuming to generate the result file and the executor scheduling behavior could lead to an early exit without successfully flushing the result file.

Describe the solution you'd like Have another config to limit tasks to profiling per stage, to reduce the profiling file size.