Batch job j-24031991c78040e482e3a02fd464c3af generated 6 MB of result metadata, most of which is taken up by "derived_from" links (there are 17694); that doesn't fit in a ZNode, resulting in the familiar ZK ConnectionLoss in the job tracker.
Maybe we should revisit/unify the way batch job result metadata is persisted and retrieved (currently a mix of a ZK/ES document and the job_metadata.json file). This benefits the ZK as well as the EJR case.
So the idea would that the job registry should only store pure batch job metadata, and the batch job result data and metadata should be separate from that
Batch job j-24031991c78040e482e3a02fd464c3af generated 6 MB of result metadata, most of which is taken up by "derived_from" links (there are 17694); that doesn't fit in a ZNode, resulting in the familiar ZK
ConnectionLoss
in the job tracker.In this case the problem might be solved by simply not patching the links: https://github.com/Open-EO/openeo-python-driver/blob/39dfaa415d42fb014bedc84a7a935cb817bca09d/openeo_driver/views.py#L1033
Maybe we should revisit/unify the way batch job result metadata is persisted and retrieved (currently a mix of a ZK/ES document and the job_metadata.json file). This benefits the ZK as well as the EJR case.
Related: https://github.com/Open-EO/openeo-geopyspark-driver/blob/6625156fb59d2de83e3b6d487cf54c6f2a17c526/openeogeotrellis/job_tracker_v2.py#L554
https://github.com/Open-EO/openeo-python-driver/issues/190