gluent / goe

GOE: a simple and flexible way to copy data from an Oracle Database to Google BigQuery.
Apache License 2.0
8 stars 2 forks source link

Support writing logs to cloud storage #177

Closed nj1973 closed 3 weeks ago

nj1973 commented 1 month ago

This is a stepping stone to making the tool more appropriate for running in a cloud setting. For example if running via Docker we need the logs to be available to the user outside of the container, or if we move towards supporting a Python API interface we need to support running without shell/local OS interactions.

nj1973 commented 3 weeks ago

There is already configuration to support a custom path: OFFLOAD_LOGDIR.

OFFLOAD_LOGDIR ends up in OrchestrationConfig.log_path

We might be able to utilise "fsspec[gcs]" Python package to simplify this work.

Touchpoints:

nj1973 commented 3 weeks ago

It probably makes sense to limit this to GCS initially but keep other Cloud Storage providers in mind.

nj1973 commented 3 weeks ago

Query Import writes Avro/Parquet file to local log directory before copying to Cloud Storage. We either need a different location if the log directory is not local or, ideally, just write to the Cloud Storage location directly.

nj1973 commented 3 weeks ago

The gcsfs package we've used only writes to GCS when there's a block size (256k) worth amount to be logged or when the file is closed. This means logs are not updated frequently. This might not be an issue, we need to have a think about whether it matters or not.