bazaarvoice / jolt

JSON to JSON transformation library written in Java.
Apache License 2.0
1.56k stars 329 forks source link

Jo;t spec to remove fields having null and empty in the json having arraylist #1263

Open lafiza opened 4 months ago

lafiza commented 4 months ago

Need a jolt spec to remove the fields having null and empty values in the json.

Input Json:

{ "properties": { "email_home": null, "email_work": "HI", "websites": "", "email": { "education_table": [ { "degree": "", "degree_type": "hello", "university": "", "start_date": null, "end_date": "", "concentration": "" }, { "degree": "", "degree_type": "hello", "university": "", "start_date": null } ], "verifiedDate":"", "userId":"rteee" }, "telephone": { "phoneNumbers": [ { "phtype": "hello", "phuniv": "", "phstart": null }, { "phtype": "hello", "phuniv": "", "phstart": null } ], "avail": "check", "date": "", "date2": null } } }

Jolt Tried

[ { "operation": "shift", "spec": { "properties": { "": "properties.&", "properties.email": { // If the below fields are null, remove them from the JSON "education_table": { "": { "": { "": { "@1": "field_to_be_deleted" }, "": { "@1": "properties.&4[&3].&2" } } } } }, "properties.telephone": { // If the below fields are null, remove them from the JSON "phoneNumbers": { "": { "": { "": { "@1": "field_to_be_deleted" }, "*": { "@1": "properties.&4[&3].&2" } } } } } } } }, { "operation": "remove", "spec": { // Remove the field field_to_be_deleted "field_to_be_deleted": "" } } ]

LucaBiscotti commented 4 months ago

Hi @lafiza, I haven't really understood what you tried to do in your Jolt transformation. It seems like you're trying to put all the empty values into a JSON so you can then remove them. While this logic isn't wrong, Jolt can't take specific values based on their content (i.e., take all the values that contain the letter 'e').

I came up with this solution:

[
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "properties": "=recursivelySquashNulls"
    }
},
  {
    "operation": "shift",
    "spec": {
      "properties": {
        "email": {
          "education_table": {
            "*": {
              "*": {
                "$": "&5.&4.&3.@0"
              }
            }
          },
          "*": {
            "$": "&3.&2.@0"
          }
        },
        "telephone": {
          "phoneNumbers": {
            "*": {
              "*": {
                "$": "&5.&4.&3.@0"
              }
            }
          },
          "*": {
            "$": "&3.&2.@0"
          }
        },
        "*": {
          "$": "&2.@0"
        }
      }
    }
  },
  {
    "operation": "remove",
    "spec": {
      "properties": {
        "": "",
        "email": {
          "": "",
          "education_table": {
            "": ""
          }
        },
        "telephone": {
          "": "",
          "phoneNumbers": {
            "": ""
          }
        }
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "properties": {
        "email": {
          "education_table": {
            "*": {
              "*": {
                "$1": "&5.&4.&3.[&1].@0"
              }
            }
          },
          "*": {
            "$": "&3.&2.@0"
          }
        },
        "telephone": {
          "phoneNumbers": {
            "*": {
              "*": {
                "$1": "&5.&4.&3.[&1].@0"
              }
            }
          },
          "*": {
            "$": "&3.&2.@0"
          }
        },
        "*": {
          "$": "&2.@0"
        }
      }
    }
  }
]

What this does is:

  1. use the recursivelySquashNullsfunction to automatically delete all the null values;
  2. go through the json and switch the keys with the values(so if you have "date":"" it will become "":"date")
  3. now we can use the remove operation on all the empty keys, that's because we can only use remove based on keys;
  4. exchange again the keys and values so that we can go back to the starting json but without the empty/null values.

This jolt is very fragile, because it only accepts the exact structure of the input you gave as an example, therefore it's basically a static solution. Let me know if this does solve your problem