Open tgravescs opened 2 years ago
Note if we were to do the full stage parallelism we would have to kind of fake the spark scheduler because if one stage is now shorter and you have a stage running in parallel that didn't get much shorter, the next stage could start earlier or it might not...
Is your feature request related to a problem? Please describe. With changes to qualification tool, we still use the aggregated task time, but then get a ratio and then apply to wall clock time. This isn't accurate because if you have overlapping sql queries or stages, the task time is going to be different then wall clock. We had chosen task time before to take cluster setup out of the equation and because the ops keep track of things at task level.
If we are now trying to show more wall clock times we need to decide if we are going to do it right and then need to apply things based on the way the application ran and take into account overlapping stages and sql queries, etc.