gluent / goe

GOE: a simple and flexible way to copy data from an Oracle Database to Google BigQuery.
Apache License 2.0
8 stars 2 forks source link

fix: Correct logic to autotune Dataproc Batches parallelism #169

Closed nj1973 closed 2 months ago

nj1973 commented 2 months ago

We were setting the wrong Spark property to increase Dataproc Batches parallelism. This PR introduces code to auto tune spark.executor.cores/instances.

If the user has already set either of these in the offload.env then we leave them to it and do nothing.