dominictarr / JSONStream

rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)
Other
1.92k stars 165 forks source link

Accessing begining of json file and continue to parse the revelant data #99

Open Syrou opened 8 years ago

Syrou commented 8 years ago

Having the following json file:

{
"stash_index": "482-2327-985-1579-278",

"stashes": [
{
"account": "willy",
"stash": "Instance#22",
"items": [
{
"name": "Some sword"
},
{
"name": "Some Axe"
}
],
"public": true
}..... ALOT MORE IN THIS ARRAY
]
}

With the following node code:

var parser = JSONStream.parse([{'stashes': true}, {'account': true, 'items': true}])
   parser.on('data', function(data){
     console.log("Adding: "+JSON.stringify(data));
   })

    //var itemsStream = JSONStream.parse(['stashes', true, 'items']);
    //var stream = JSONSelect(['stashes', true, {'accountName': true, 'items': true}]);
    request({url: URL_TO_LOAD}).pipe(parser);

I manage to get out all information I need just fine, however.. I'd like to pick up 'stash_index' in the same go and index it wards all the stashes saved to db. How do I achieve that?

dominictarr commented 8 years ago

okay, lemme check I understand.

you are getting the contents of the stash array, as {account, item} pairs, but you'd rather have {stash_index, account, item} but the tricky thing is that stash_index is at a higher level in the object so you'd get many items output with the same stash_index?

or is does stash_index appear one time at the top of the file?

Syrou commented 8 years ago

One time at the top of the file :) And since the file itself is so big i'd rather use the stream constructed flow to keep down the allocated memory. I tried to use all sorts of filter streams, but I didnt mange to keep the stream sane and at the same time get out the top-level stash_index

dominictarr commented 8 years ago

got it... same problem as https://github.com/dominictarr/JSONStream/issues/93

Syrou commented 8 years ago

As far as I can see it, can't we just access the first few bytes of the json file, lift it out and disgard the rest until the uses has specificed the properties of importance?

dominictarr commented 8 years ago

yeah, like, in your case, it could emit a header event once it gets to stashes property, that contains all the things before that. so you'd get {stash_index: ...} this would also work for couchdb, etc. I would be happy to merge such a pull request.