[FEA] support task limit profiling for specified stages

Is your feature request related to a problem? Please describe.

For now, spark-rapids only support to profile a whole stage based on the specified stage id, refer to the setting spark.rapids.profile.stages described here. While for production queries with super large input data size, a single single stage could last for tens minutes or even hours, it could be quite time-consuming to generate the result file and the executor scheduling behavior could lead to an early exit without successfully flushing the result file.

Describe the solution you'd like Have another config to limit tasks to profiling per stage, to reduce the profiling file size.

NVIDIA / spark-rapids

[FEA] support task limit profiling for specified stages #11666