datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.75k stars 2.88k forks source link

JSON Patch updates fail when column names have slashes (/) or tildes (~) #10716

Open ipolding-cais opened 3 months ago

ipolding-cais commented 3 months ago

Describe the bug When applying a JSON patch com.linkedin.metadata.aspect.patch.template.TemplateUtil#populateTopLevelKeys uses a JsonPatch as a template for a node to transform.

JsonPatch objects will have / and ~ encoded as ~1 and ~0 respectively. This needs to be converted back when creating an object based on the paths of a JSON Patch

To Reproduce Steps to reproduce the behavior:

  1. Trigger a metadata change that will apply a patch to a column name with a / in it:
    {\"op\": \"add\", \"path\": \"/fineGrainedLineages/NONE/urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,my_database.my_schema.my_table,PROD),slash is ~1)", \"value\": {\"confidenceScore\": 1.0}}
  2. Run ingestion pipeline
  3. See error

Expected behavior The "target" object should have a key with the slash unescaped so it looks something like this:

{"/fineGrainedLineages/NONE/urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,my_database.my_schema.my_table,PROD),slash is /)" : {}}

However, the target retains the encoded / so it looks like this:

{"/fineGrainedLineages/NONE/urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,my_database.my_schema.my_table,PROD),slash is ~1)" : {}}

However, the JSON Pointer logic correctly tries to find:

"/fineGrainedLineages/NONE/urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,my_database.my_schema.my_table,PROD),slash is /)"

Additional context This logic would fix it

public static JsonNode populateTopLevelKeys(JsonNode transformedNode, JsonPatch jsonPatch) {
        JsonNode transformedNodeClone = transformedNode.deepCopy();
        List<Pair<PatchOperationType, String>> paths = getPaths(jsonPatch);
        for (Pair<PatchOperationType, String> operationPath : paths) {
            String[] keys = (String[]) Arrays.stream(
                            operationPath.getSecond().split("/"))
                    .map(k -> k.replace("~1", "/").replace("~0", "~")).toArray();
ipolding-cais commented 3 months ago

Here is the PR with the fix

https://github.com/datahub-project/datahub/pull/10717