Closed muttcg closed 4 months ago
To take into consideration for this task:
Filters could be implemented as new Elasticseach fields produced from the Vector transformations for each service.
Logging for Hbase, Hadoop, Trino, Zookeeper and Spark (Outside of Downloads) has been added to the stackable charts and Airflow.
https://github.com/gbif/gbif-configuration/pull/15 and https://github.com/gbif/stackable-spark/pull/2 contains the changes to add logging to the pipelines spark jobs. Before they are merged it requires them to use the 23.11 version of the operator.
There is a conflict between yunikorn and running a sidecar pod for logging. When the template uses the yunikorn driver and executor pod override to add yunikorn.apache.org/task-groups and a logging section, the job gets stuck in pending mode. Removing the pod override or the logging part resolves this issue.
The conflict is resolved in PR: https://github.com/gbif/gbif-airflow-dags/pull/9
Some logs are available through Airflow, but not all. We need to have comprehensive logs or a detailed history, perhaps using ELK or another common approach for K8s.