Currently, the parquet file name written by Gluten is
Gluten_Stage_3_TID_2124_VTID_257_0_3_0946dfb5-f773-42c9-ac8e-d4e70bede02b.parquet
which is generated by the default behavior in velox HiveDataSink.cpp
https://github.com/facebookincubator/velox/pull/10903 add a new targetFileName in LocationHandle, so we can specify the targetFileName that contains compression kind suffix from Gluten side, which is more consistent with the parquet file name generated by vanilla Spark.
The parquet files generated by Spark are named part-uuid.codec-extension.parquet. I have defined the name of the parquet file written by Gluten as gluten-part-uuid.codec-extension.parquet, with the gluten prefix added to indicate that the file is generated by Gluten.
Description
Currently, the parquet file name written by Gluten is Gluten_Stage_3_TID_2124_VTID_257_0_3_0946dfb5-f773-42c9-ac8e-d4e70bede02b.parquet which is generated by the default behavior in velox
HiveDataSink.cpp
https://github.com/facebookincubator/velox/pull/10903 add a new
targetFileName
inLocationHandle
, so we can specify thetargetFileName
that contains compression kind suffix from Gluten side, which is more consistent with the parquet file name generated by vanilla Spark.The parquet files generated by Spark are named part-uuid.codec-extension.parquet. I have defined the name of the parquet file written by Gluten as gluten-part-uuid.codec-extension.parquet, with the gluten prefix added to indicate that the file is generated by Gluten.