NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
783 stars 228 forks source link

[FEA] audit all semaphore acquires to find empty cases #4568

Open abellina opened 2 years ago

abellina commented 2 years ago

@jlowe had a comment on (https://github.com/NVIDIA/spark-rapids/issues/4392) that was not addressed completely, so I am opening this issue to track:

Note that we should catch all the places where this (acquiring the semaphore unnecessarily) could occur, i.e.: the various multi-file readers, coalesce readers, single-file readers, etc. across all input types (Parquet, ORC, CSV, etc.) Also do we need to worry about this for shuffle, i.e.: is there a case where shuffle read could grab the semaphore for no/empty batches for a task?

This is a lower priority task but it would be good to take care of this so that we don't have odd acquires of the semaphore for something that doesn't need it.

sperlingxx commented 2 years ago

4588

abellina commented 2 years ago

Part of this issue: https://github.com/NVIDIA/spark-rapids/issues/5058