Open jochenchrist opened 2 months ago
Another example:
models:
orders:
description: One record per order. Includes cancelled and deleted orders.
type: table
fields:
order_id:
$ref: '#/definitions/order_id'
required: true
unique: true
primary: true
lineage:
inputFields:
- namespace: kafka
name: com.example.checkout.order.v1 # kafka topic name
field: orderId
transformations:
- type: DIRECT
subtype: IDENTITY
- namespace: kafka
name: com.example.checkout.order.v1 # kafka topic name
field: status
transformations:
- type: INDIRECT
subtype: FILTER
description: filtered on status = 'successful'
order_timestamp:
description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
type: timestamp
required: true
example: "2024-09-09T08:30:00Z"
lineage:
inputFields:
- namespace: kafka
name: com.example.checkout.order.v1 # kafka topic name
field: orderPaymentSucessful
transformations:
- type: DIRECT
subtype: IDENTITY
order_total:
description: Total amount the smallest monetary unit (e.g., cents).
type: long
required: true
example: "9999"
lineage:
inputFields:
- namespace: kafka
name: com.example.checkout.order.v1 # kafka topic name
field: lineitems[].price
transformations:
- type: DIRECT
subtype: AGGREGATION
description: aggregated all line item prices
customer_id:
description: Unique identifier for the customer.
type: text
minLength: 10
maxLength: 20
customer_email_address:
description: The email address, as entered by the customer.
type: text
format: email
required: true
pii: true
classification: sensitive
lineage:
inputFields:
- namespace: kafka
name: com.example.checkout.order.v1 # kafka topic name
field: emailAddress
transformations:
- type: DIRECT
subtype: TRANSFORMATION
masking: true
Support for field-level upstream lineage could be aligned to OpenLinage format for "inputFields" (https://openlineage.io/docs/spec/facets/dataset-facets/column_lineage_facet/)
First draft, open to discuss: