For object stores whose consistency model means that rename-based commits are safe use the FileOutputCommitter v2 algorithm for performance; v1 for safety.
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2
The page also lists Google Cloud Storage (gs) as a safe object store. Therefore when staging to GCS we should use this and can add the following to the offload.env.template.bigquery template file:
From https://spark.apache.org/docs/latest/cloud-integration.html:
The page also lists Google Cloud Storage (gs) as a safe object store. Therefore when staging to GCS we should use this and can add the following to the
offload.env.template.bigquery
template file:We need to verify the information above is still accurate before working on this.