Closed nj1973 closed 3 weeks ago
There is already configuration to support a custom path: OFFLOAD_LOGDIR
.
OFFLOAD_LOGDIR
ends up in OrchestrationConfig.log_path
We might be able to utilise "fsspec[gcs]" Python package to simplify this work.
Touchpoints:
src/goe.init_log()
: Sets global file handle log_fh
src/orchestration_config.__init__()
: We need some validation of log_path
src/listener/api/routes/orchestration.command_log_path.get_command_execution_log()
: Uses regular file methodssrc/offload/offload_messages.init_log()
: Sets file handle log_fh
src/offload/offload_transport_functions.schema_paths()
: This function appears to use the log path to write a temporary Avro schema filesrc/util/goe_log.py
: Are these routines still used?test_cli_api.get_log_path()
: Appears to be redundant.It probably makes sense to limit this to GCS initially but keep other Cloud Storage providers in mind.
Query Import writes Avro/Parquet file to local log directory before copying to Cloud Storage. We either need a different location if the log directory is not local or, ideally, just write to the Cloud Storage location directly.
The gcsfs package we've used only writes to GCS when there's a block size (256k) worth amount to be logged or when the file is closed. This means logs are not updated frequently. This might not be an issue, we need to have a think about whether it matters or not.
This is a stepping stone to making the tool more appropriate for running in a cloud setting. For example if running via Docker we need the logs to be available to the user outside of the container, or if we move towards supporting a Python API interface we need to support running without shell/local OS interactions.