dominictarr / JSONStream

rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)
Other
1.92k stars 165 forks source link

Simple Filter (remove) By Key Example? #124

Open petergdoyle opened 7 years ago

petergdoyle commented 7 years ago

Hi Can you suggest the most simple, memory efficient way to remove elements with a specific key value? There is reference to this with the JSONStream.parse(pattern, map) form of the parse method. Given your example doc:

{"total_rows":129,"offset":0,"rows":[
  { "id":"change1_0.6995461115147918"
  , "key":"change1_0.6995461115147918"
  , "value":{"rev":"1-e240bae28c7bb3667f02760f6398d508"}
  , "doc":{
      "_id":  "change1_0.6995461115147918"
    , "_rev": "1-e240bae28c7bb3667f02760f6398d508","hello":1}
  },
  { "id":"change2_0.6995461115147918"
  , "key":"change2_0.6995461115147918"
  , "value":{"rev":"1-13677d36b98c0c075145bb8975105153"}
  , "doc":{
      "_id":"change2_0.6995461115147918"
    , "_rev":"1-13677d36b98c0c075145bb8975105153"
    , "hello":2
    }
  }
]}

if I wanted to remove all the "_id" and "_rev" elements anywhere they are encountered in the document (recursively) defined within another filter object

{
"filter": [
    "_id", 
    "_rev"
  ]
}

What is the "best" (most efficient) way to do this by a JSONPath? or by applying a map function with JSONStream.parse(pattern, map) and using a general filter "star.star.star' (that you have mentioned before for processing large documents and not loading the entire object into memory)? Also since this is stream processor, is the order of elements in the input document preserved?

dominictarr commented 7 years ago

I think the answer would to be set those keys to undefined or null don't uses delete because it's really slow (for some weird reason). there isn't a JSONStream feature to do this, but you could mutate the objects in the map function

petergdoyle commented 7 years ago

Thanks for the quick reply.

So I guess first things first. I am having trouble with the parse function. In my case I want every element in the input stream so I want a "any" jsonpath selection but I also want to recurse to whatever depth the elements go, and I want to emit keys since I am trying to rewrite the JSON input directly to output but with a few of the keyed values removed (filtered out) AND I want to preserve the order of elements in the input stream. It sounded so simple.

So if I use a parse pattern like '$*' I can get all keys and values, but it doesn't recurse...


    source.pipe(JSONStream.parse('$*'))
    .pipe(es.mapSync(function (data) {
      console.log(data);
    }));

==>
{ value: 129, key: 'total_rows' }
{ value: 0, key: 'offset' }
{ value: 
   [ { id: 'change1_0.6995461115147918',
       key: 'change1_0.6995461115147918',
       value: [Object],
       doc: [Object] },
     { id: 'change2_0.6995461115147918',
       key: 'change2_0.6995461115147918',
       value: [Object],
       doc: [Object] } ],
  key: 'rows' }

According to your documentation, it seems you can only recurse using a {recurse: true} but that seems to require the array form of the "match" value

this selects nothing: source.pipe(JSONStream.parse('$..*')) this selects nothing: source.pipe(JSONStream.parse(['$*',{recurse: true}])) this selects nothing: source.pipe(JSONStream.parse(['*',{recurse: true}, {returnKeys: true}])) this throws an error: source.pipe(JSONStream.parse('*..*'))

So I give up, what is the parse value that will select all nodes, will return all keys and values starting at and including the top level, and will recurse into every value?

Thanks!

mattmackay76 commented 6 years ago

I have the same issues of what to pass into parse.. I tried many of the same things you mentioned above like $..* and such. What I want to do is OR together things. I also tried regex to no avail.

Example json: { "Name": "abcName", "LargeCollection1" : [ ...you get the idea..... ], "LargeCollection2" : [ ...you get the idea, this is large hence needing to stream 15+ megs in here..... ], "Address": "123 some st", "City": "New York", "State":"NY" }

So, I want something like JSONStream.parse("Name|Address|City") to fire on('data') so I can capture this while ignoring megs of data I don't care about. I should say it would be like filtering through the entire stream but should produce a new JSON object rather than just catching "Name" and returning "abcName".

Is it possible to get back something like:

{ "Name": "abcName", "Address": "123 some st", "City": "New York", "State":"NY" } So basically only the root level properties I care about, ignore everything else, output as JSON.

Any advice on this would be awesome!..

-Matt