gluent / goe

GOE: a simple and flexible way to copy data from an Oracle Database to Google BigQuery.
Apache License 2.0
8 stars 2 forks source link

Cater for Dataproc Batches TTL #164

Closed nj1973 closed 2 months ago

nj1973 commented 2 months ago

Newer versions of Dataproc Batches introduce a batch lifetime facility that can be controlled by the gcloud option --ttl. This defaults to 4 hours. If the batch lifetime is exceeded then a job will terminate but still with a status of zero.

In the gcloud command output we will see:

WARNING: Batch job is CANCELLED.

There are two requirements:

  1. Add support for controlling the --ttl option via an environment variable. Give the option a generous default, more than the actual default of 4 hours.
  2. Batches that exit as cancelled are not failing the offload, we need a way to check whether a batch actually finished or was cancelled and, in the latter case, abort the offload.