Kyligence / spark

customized spark for KAP use, checkout kyspark branch
Apache License 2.0
4 stars 51 forks source link

[SPARK-44305][SQL] Dynamically choose whether to broadcast hadoop conf #746

Closed 7mming7 closed 6 months ago

7mming7 commented 6 months ago

What changes were proposed in this pull request?

In the buildReaderWithPartitionValues method of ParquetFileFormat and OrcFileFormat, it will determine whether to broadcast hadoopconf according to whether there are hadoop parameters specified by the customer. If there are no parameters specified by the user, the broadcast operation of hadoopconf will not be performed to improve the performance of the driver.

Why are the changes needed?

The ability introduced by SPARK-14912, we can broadcast the parameters of the data source to the read and write operations, but if the user does not specify a specific parameter, the propagation operation will also be performed, which affects the performance has a greater impact, so we need to avoid broadcasting the full Hadoop parameters when the user does not specify a specific parameter

Does this PR introduce any user-facing change?

NO

How was this patch tested?

QPS improvement, detailed test data will be added later