gluent / goe

GOE: a simple and flexible way to copy data from an Oracle Database to Google BigQuery.
Apache License 2.0
8 stars 2 forks source link

feat: Add GOOGLE_DATAPROC_BATCHES_TTL #168

Closed nj1973 closed 2 months ago

nj1973 commented 2 months ago

This PR adds a new environment variable, GOOGLE_DATAPROC_BATCHES_TTL, to allow us to control Dataproc Batches TTL setting. In newer Batches versions this defaults to 4 hours and when breached cancels the job with a successful status which is not what we want, in v1.1 the parameter defaults to "run forever".

This PR also adds a check after the gcloud call to abort the offload if the batch state is CANCELLED or FAILED.

I've defaulted the GOOGLE_DATAPROC_BATCHES_TTL variable to 2 days in offload.env. I'm open to debate whether that is a good number or not.