MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.76k stars 315 forks source link

Feature Request: Add Batch Ingestion Endpoint for OpenLineage Events #2918

Open algorithmy1 opened 2 weeks ago

algorithmy1 commented 2 weeks ago

Currently, the Marquez API for OpenLineage events (/api/v1/lineage) accepts one event per request, as seen in OpenLineageResource.java#L67. While this is suitable for real-time ingestion, it becomes inefficient when we need to ingest multiple events simultaneously.

Use Case:

Proposal:

(Or even update the current one /api/v1/lineage to accept both options)

Benefits:

wslulciuc commented 23 hours ago

Thanks for the suggestion, @algorithmy1! We couldn't agree more on the benefits you outlined. The good news is that we've been prototyping such an endpoint for OpenLineage batch events, see v2.LineageResource.collectBatchOf(BatchOfEvents). The endpoint will be available in Marquez 0.51.0.