Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

gdal.py can not use logging library with default multiprocessing #906

Open JeroenVerstraelen opened 3 hours ago

JeroenVerstraelen commented 3 hours ago

Description:

We've encountered an issue with deadlocks arising from an interaction between Python's multiprocessing and logging libraries in gdal.py. These deadlocks occur sporadically, leading to batch jobs running significantly longer than expected or getting stuck entirely.

Problem:

Next Steps: The ultimate solution will involve refactoring the module to reduce the scope of multithreading, but this will require more extensive changes.

Current Workaround: Temporarily, we've opted to remove logging to avoid the deadlocks until we can rewrite the multithreaded parts of the module.

Commit that turned off logging in gdal.py: https://github.com/Open-EO/openeo-geopyspark-driver/commit/c1b8676b3c93183082c93b97601cabe38e643f34

bossie commented 2 hours ago

Restored a couple of log entries (poorly) in the context of https://github.com/eu-cdse/openeo-cdse-infra/issues/278; they use print() rather than a logger.