NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
44 stars 34 forks source link

[FEA] AutoTuner on GPU eventlog look at decompressed input size to determine max partition bytes #1062

Open tgravescs opened 1 month ago

tgravescs commented 1 month ago

Is your feature request related to a problem? Please describe. We are looking at the max input bytes in the auto tuner to decide what to set spark.sql.files.maxPartitionBytes to. But the value we get is the compressed size. Many times these sizes can explode and our setting is to high.

For GPU event logs see if we can get the uncompressed size. We may need to add another metric to the Spark rapids plugin for this.