We defined a NodeIDin this way in order to encode metadata about the node type and ensure global unique IDs; the LineageAPI returns graph nodes with all metadata associated with that given node type. For example, below is the NodeID for dataset food_delivery:public.delivery_7_days:
dataset:food_delivery:public.delivery_7_days
where, food_delivery is the namespace, and public.delivery_7_days is the name of the dataset. A call to the LineageAPI will return the graph node:
{
"id": "dataset:food_delivery:public.delivery_7_days",
"type": "DATASET",
"data": {
"id": { "namespace": "food_delivery", "name": "public.delivery_7_days" },
"type": "DB_TABLE",
"name": "public.delivery_7_days",
"physicalName": "public.delivery_7_days",
"createdAt": "2024-10-24T19:27:05Z",
"updatedAt": "2024-10-24T22:36:06Z",
"namespace": "food_delivery",
"sourceName": "food_delivery_db",
"fields": [
{ "name": "order_id", "type": "INTEGER", "description": "The ID of the order." },
{ "name": "order_placed_on", "type": "TIMESTAMP", "description": "ISO-8601 timestamp for when the order was placed." },
{ "name": "order_dispatched_on", "type": "TIMESTAMP", "description": "ISO-8601 timestamp for dispatch." },
{ "name": "order_delivered_on", "type": "TIMESTAMP", "description": "ISO-8601 timestamp for delivery." },
{ "name": "customer_email", "type": "VARCHAR", "description": "Customer's email address." },
{ "name": "customer_address", "type": "VARCHAR", "description": "Customer's physical address." },
{ "name": "menu_id", "type": "INTEGER", "description": "ID of the related menu." },
{ "name": "restaurant_id", "type": "INTEGER", "description": "ID of the restaurant." },
{ "name": "restaurant_address", "type": "VARCHAR", "description": "Restaurant's address." },
{ "name": "menu_item_id", "type": "INTEGER", "description": "ID of the menu item." },
{ "name": "category_id", "type": "INTEGER", "description": "ID of the category." },
{ "name": "discount_id", "type": "INTEGER", "description": "ID of the discount." },
{ "name": "city_id", "type": "INTEGER", "description": "ID of the city." },
{ "name": "driver_id", "type": "INTEGER", "description": "ID of the driver." }
],
"tags": [],
"lastModifiedAt": null,
"description": null,
"lastLifecycleState": ""
},
"inEdges": [
{ "origin": "job:food_delivery:etl_delivery_7_days", "destination": "dataset:food_delivery:public.delivery_7_days" }
],
"outEdges": [
{ "origin": "dataset:food_delivery:public.delivery_7_days", "destination": "job:food_delivery:delivery_times_7_days" }
]
}
Error on NodeId.parse()
But, what if the namespace contains a colon :? Our NodeId.parse() method errors (not fun!). For example, node parsing will error for the namespace:
trino://trino-integration-test:1337
We need to move away from NodeId with encoded metadata (no longer needed as we move towards a light-weight lineage graph response -- just nodes and edges).
Use UUIDs as NodeIDs
Let's move to using UUIDs for NodeIDs when the lineage graph returns just nodes and edges an supports the following lineage graphs:
Recently, we've seen various bugs reported for
NodeID
parsing issues:A
NodeID
consists of multiple parts (i.e. metadata) delimited by a colon (:
). ANodeID
can be of type:dataset
,jobs
, etc with the following parts:We defined a
NodeID
in this way in order to encode metadata about the node type and ensure global unique IDs; the LineageAPI returns graph nodes with all metadata associated with that given node type. For example, below is theNodeID
for datasetfood_delivery:public.delivery_7_days
:where,
food_delivery
is the namespace, andpublic.delivery_7_days
is the name of the dataset. A call to the LineageAPI will return the graph node:Error on
NodeId.parse()
But, what if the namespace contains a colon
:
? OurNodeId.parse()
method errors (not fun!). For example, node parsing will error for the namespace:We need to move away from
NodeId
with encoded metadata (no longer needed as we move towards a light-weight lineage graph response -- just nodes and edges).Use
UUID
s asNodeID
sLet's move to using
UUID
s forNodeID
s when the lineage graph returns just nodes and edges an supports the following lineage graphs:dataset
->dataset
dataset
->column/field
columns/fields
->dataset
job
->job
job
->dataset
dataset
->job