spark workflow avro vs. mysql reading

What would be the performance gain from waiting to write MySQL rows until very end, instead, relying on avro files for check/breakpoints through the various phases?

Moreover, what is the affect of this coalesce here? If this has no output, would assume that downstream goes back up and might potentially re-run earlier points in the code:

if write_avro:
    records_df_combine_cols.coalesce(settings.SPARK_REPARTITION)\
    .write.format("com.databricks.spark.avro").save(self.job.job_output)

Would be worth investigating performance from this angle.

MI-DPLA / combine

spark workflow avro vs. mysql reading #223