MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.78k stars 320 forks source link

Feature Request: Add Batch Ingestion Endpoint for OpenLineage Events #2918

Open algorithmy1 opened 1 month ago

algorithmy1 commented 1 month ago

Currently, the Marquez API for OpenLineage events (/api/v1/lineage) accepts one event per request, as seen in OpenLineageResource.java#L67. While this is suitable for real-time ingestion, it becomes inefficient when we need to ingest multiple events simultaneously.

Use Case:

Proposal:

(Or even update the current one /api/v1/lineage to accept both options)

Benefits:

wslulciuc commented 1 month ago

Thanks for the suggestion, @algorithmy1! We couldn't agree more on the benefits you outlined. The good news is that we've been prototyping such an endpoint for OpenLineage batch events, see v2.LineageResource.collectBatchOf(BatchOfEvents). The endpoint will be available in Marquez 0.51.0.