Open-EO / openeo-geopyspark-integrationtests

Integrationtests for GeoPyspark backend
Apache License 2.0
0 stars 0 forks source link

include tests for logs #3

Open bossie opened 1 year ago

bossie commented 1 year ago

Quite a bit of effort has been made to get centralized logging in place, of all components that make up our OpenEO back-ends, in Python and Java, in Spark drivers and executors, in web app and batch job contexts, with IDs such as user ID, request ID and job ID to correlate them.

Recently however, the logging infrastructure itself (filebeat, logstash etc) has been a bit unreliable for reasons not entirely known and logs were not visible in Kibana.

Maybe the current integration tests can be extended to make sure that our logging still behaves, both application-wise and infrastructure-wise.

For batch jobs, the /jobs/{job_id}/logs endpoint can be used.

For synchronous requests this is not so simple, because the request ID is not propagated to the client unless the request fails, in which case the request ID is returned in the error message. It is therefore not possible to look for logs that pertain to this particular request. Even if we always return this request ID in a response header, regardless of its outcome, a question that remains is: how will the openeo-python-client expose this request ID so we can fetch the corresponding logs?

bossie commented 1 year ago

We might consider a pragmatic approach and say: this synchronous request that we started a minute ago, succeeded, and therefore the logs of the last minute should contain these log entries.

soxofaan commented 1 year ago

some other quick ideas:

bossie commented 1 year ago

How would you allow the client (programmer) to set the request header? Keep it in a global/per-connection variable or are there better ways?

soxofaan commented 1 year ago

you can do both: