Closed srinvias closed 5 years ago
Spec
[
{
"operation": "shift",
"spec": {
"*": { // top level array
"id": { // use the id as a key in map
"*": "ids.&[]"
}
}
}
},
{
"operation": "shift",
"spec": {
"ids": {
"*": {
"$": "[#2].id" // grab all the keys, which are now unique
}
}
}
}
]
Have to do two shifts. First one uses the input "id"s as keys in map. Second one iterates over the keys and puts them into a list.
first of all thank you @milosimpson for quick response .
Correct me if i am wrong , the above solution may work only eliminate duplicate json records based on one field , but we have a senarios like eliminating duplicates based on multiple fields . in the below example domain,location,time,function,unit Please provide the scripts to process in jolt . Thanks .
or I can say simply eliminate duplicate json files from array of json
Input :
[{ "domain": "www.google.com", "location": "newyork", "time": "CDT UTC-0500", "function": "PACK", "unit": "PACK_ESR" },
{ "domain": "www.yahoo.com", "location": "newyork", "time": "CDT UTC-0500", "function": "PACK", "unit": "PACK_ESR" },
{ "domain": "www.google.com", "location": "newyork", "time": "CDT UTC-0500", "function": "AOI_S1", "unit": "AOI_L31" },
{ "domain": "www.google.com", "location": "newyork", "time": "CDT UTC-0500", "function": "ALIGN", "unit": "ALIGN2" },
{ "domain": "www.yahoo.com", "location": "newyork", "time": "CDT UTC-0500", "function": "PACK", "unit": "PACK_ESR" },
{ "domain": "www.google.com", "location": "texas", "time": "CDT UTC-0500", "function": "PACK", "unit": "PACK_ESR" },
{ "domain": "www.hortonworks.com", "location": "newyork", "time": "CDT UTC-0500", "function": "ALIGN", "unit": "ALIGN2" } ]
Desired output : (I can say simply eliminate duplicate json files from array of json )
[{ "domain": "www.google.com", "location": "newyork", "time": "CDT UTC-0500", "function": "PACK", "unit": "PACK_ESR" },
{ "domain": "www.yahoo.com", "location": "newyork", "time": "CDT UTC-0500", "function": "PACK", "unit": "PACK_ESR", { "domain": "www.google.com", "location": "texas", "time": "CDT UTC-0500", "function": "PACK", "unit": "PACK_ESR" },
{ "domain": "www.hortonworks.com", "location": "newyork", "time": "CDT UTC-0500", "function": "ALIGN", "unit": "ALIGN2" } ]
At best, Jolt can dedup a single field. Deduping a whole "sub-document" is not in its scope.
Thank you @milosimpson Is it able to delete duplicate Json record in array of JSONs ? If yes . May i know what will be the script ?
Hi Milo, I have a similar scenario,what if i need to dedup using id and add other elements as well in target json like below.
Input :
[
{
"id": 1,
"name": "jeorge",
"age": 25
},
{
"id": 2,
"name": "manhan",
"age": 25
},
{
"id": 1,
"name": "george",
"age": 225
}
]
Spec:
[
{
"operation": "shift",
"spec": {
"*": { // top level array
"id": { // use the id as a key in map
"*": "ids.&[]"
}
}
}
},
{
"operation": "shift",
"spec": {
"ids": {
"*": {
"$": "[#2].id" // grab all the keys, which are now unique
}
}
}
}
]
Output:
[
{
"id": 1,
"name": "jeorge",
"age": 25
},
{
"id": 2,
"name": "manhan",
"age": 25
}
]
Hi @milosimpson , I have a similar scenario,what if i need to dedup using id and add other elements as well in target json like below.
Input :
[ { "id": 1, "name": "jeorge", "age": 25 }, { "id": 2, "name": "manhan", "age": 25 }, { "id": 1, "name": "george", "age": 225 } ] Spec:
[ { "operation": "shift", "spec": { "": { // top level array "id": { // use the id as a key in map "": "ids.&[]" } } } }, { "operation": "shift", "spec": { "ids": { "*": { "$": "[#2].id" // grab all the keys, which are now unique } } } } ] Output:
[ { "id": 1, "name": "jeorge", "age": 25 }, { "id": 2, "name": "manhan", "age": 25 } ]
I am trying to remove duplicate json records from json array using Jolt transformation . Here is an example i tried : Input :
Jolt script :
Output :
getting only selected records . along with that , i would like to remove duplicates . Desired Output :
Please provide the necessary script which will help me . Thanks in advance .