Open mobuchowski opened 2 years ago
While Marquez will support an event of type RUNNING
, when considering this in the context of a streaming job, we may need to consider the impact of this event on job versions and dataset versions. Currently, Marquez sets the current version of a job and a dataset only when receiving a COMPLETE
event. Dataset versions are created before then, but the dataset
record itself isn't updated until COMPLETE
. Job versions aren't created at all until a COMPLETE
event is received. Most importantly, lineage only considers the current_version_uuid
column of the jobs
table. This means that a streaming job won't show any lineage at all until the job terminates with a COMPLETE
event. We can update the logic here, but we need to know it's a streaming job. Perhaps a facet to report that it's a streaming job, not a batch job?
OpenLineage
introducesRUNNING
event type which models continuous streaming job that it currently running - to differentiate it from genericOTHER
event type. Related issues are https://github.com/OpenLineage/OpenLineage/issues/946 and discussion here: https://github.com/OpenLineage/OpenLineage/issues/599Are there any possible problems within Marquez with receiving those type of events? I know
LineageEvent
has StringeventType
- but there could be something else dependant on existing event types.