Closed bossie closed 1 year ago
Ported the envar-based approach to k8s.
Would these files need to get synced up? https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/batch_job_log4j2.xml https://github.com/Open-EO/openeo-geopyspark-driver/blob/master/scripts/batch_job_log4j2.xml
For example copying over <Logger name="org.apache.spark.scheduler.TaskSetManager" level="warn"/>
Sure, it makes sense to apply your logging enhancements to CDSE too.
Fixed:
Tested with:
connection = openeo.connect("https://openeo-staging.creo.vito.be").authenticate_oidc("CDSE")
data_cube = (connection.load_collection("SENTINEL3_OLCI_L1B")
.filter_bands(["B02", "B17", "B19"])
.filter_bbox([2.59003, 51.069, 2.8949, 51.2206])
.filter_temporal(["2018-08-06T00:00:00Z", "2018-08-06T00:00:00Z"])
.reduce_dimension("t", reducer="mean"))
data_cube.execute_batch("/tmp/test_cdse_sentinel3_olci_staging_batch.tif",
job_options={"logging-threshold": "debug"})
Using MDC to attach user ID, job ID etc (https://github.com/Open-EO/openeo-geotrellis-extensions/issues/64) did not seem to cover all log enrtries, in particular Spark
TaskSetManager
logs. Unlike the OpenEO web app, a batch job can grab these values from environment variables so in this case we decided to go back to a more reliable envar-based approach.This meant different Log4j 2 configuration files for web app and batch jobs on Terrascope:
log4j2.xm
l that usesOpenEOJsonLogLayout.json
andbatch_job_log4j2.xml
that usesclasspath:OpenEOBatchJobJsonLogLayout.json
respectively.Code that sets up MDC has since been removed from batch jobs but k8s hasn't been adapted yet to use the envar-based approach, effectively removing user ID and job ID from log entries in batch jobs.