Open davidjgoss opened 4 months ago
I'd be happy to contribute this change (since I would also like to see the feature implemented), but would probably need a little guidance on how to get started
Great suggestion @davidjgoss; there's a similar discussion in issue https://github.com/MarquezProject/marquez/issues/2874
The OpenLineage standard column lineage facet has been extended in 1.17.1 so that each field in
inputFields
can now have an array oftransformations
describing transformations specific to that input field in the context of the output field. See https://github.com/OpenLineage/OpenLineage/pull/2756.Ideally Marquez should support storing and serving this data if present in OpenLineage events.
Note that the existing
transformationType
andtransformationDescription
fields at the output field level still exist but have been deprecated.Database
The corresponding table in Marquez would be
column_lineage
, with each row there effectively representing one entry ininputFields
. We could add another table joining with this e.g.column_lineage_transformations
or - perhaps more pragmatically - use a JSON column on the existing table to hold transformations.API
The
transformations
array could be added to theColumnLineageInputField
model which is included in the column lineage response and the dataset response.