datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.93k stars 2.94k forks source link

fix(ingest/partitionExecutor): Fetch ready items for non-empty batch when _pending is empty #11885

Closed asikowitz closed 3 days ago

asikowitz commented 3 days ago

Previously, the batch partition executor could stall for min_process_interval (default 30s) if the batch was not empty, but _pending was cleared out into pending_key_completion. This changes the logic to re-calculate the batch if we can't ready from _pending, i.e. if it's been cleared out and the main ingestion process is blocked.

Reverts the old test so it's only testing batching behavior; adds new deadlock test that tests both the old deadlock failure + this new 30s wait failure. Before the change is made, the test times out after 10 seconds, now it passes in ~1s.

Checklist