ddopson / underscore-cli

Command-line utility-belt for hacking JSON and Javascript.
Other
1.72k stars 83 forks source link

Feature Request: Multi-object Stream parsing #2

Open patricknevindwyer opened 12 years ago

patricknevindwyer commented 12 years ago

I have a common use case wherein a data stream is actually one JSON object per line, newline delimiting a new object. It would be nice to be able to specify a record delimiter so I can avoid manually pulling out records.

As an example: I current have a file that might look like:

{"name": "John", "role": "admin"}
{"name": "Jill", "role":"user"}

Contrived, to be sure, but representative of the problem. Currently I need to either pluck out a line manually, such as:

head -n 1 file.json | underscore print --color

Where instead it would be tremendously useful to be able to:

cat file.json | underscore print --color --multi --delimiter "\n"
ddopson commented 12 years ago

Awesome. I've been waiting for a reason to need to build this... :)

ddopson commented 12 years ago

... I am still thinking about this, I just haven't been able to block out a Saturday afternoon to hack it out.

Ideally, the JSON parser would intelligently interpret the input and detect multi-record format without having to explicitly specify a delimiter, but that's a bit tricky. I might settle on imlementing a line-delimited format (basically this is what the "text" format emits, only delta is that when reading "text" it interprets lines as strings instead of JSON parsing them). That would be a very simple implementation and might be close enough to enable the core scenarios. ...

... although without more changes, the tool will read in the entire input set before doing any processing. I need to convert it to a stream-oriented processor where possible. I can file that as a second bug.

I don't know about this weekend. Next weekend looks a bit better. So I'm tentatively scheduling implementation of this stuff for 2012-09-29

larsyencken commented 12 years ago

This would be great. In large json datasets, the record-per-line style is the norm so that you don't have to parse the entire set into memory at once (as you're forced to in, say, xml). It also means you can skip an individual record that's corrupted.

+1

agnoster commented 11 years ago

We have the same issue, with JSON-formatted log files. Obviously it would need to handle streaming, or it would be pointless, but it's tricky because it would mean suddenly underscore functions like "map", "select" and so on would need to stream, too.

I was actually working on a tool like this for my own purposes, using a streaming JSON parser (so that even if the file was a HUGE JSON-formatted array you could stream results), but without the nifty idea of underscore delegation. But ultimately, I don't think there's a good way to re-use underscore and also support streaming. Unless someone has an idea for that?

jduprey commented 11 years ago

+1 to this feature. I just figured out that is why underscore wasn't working for me. To work around it, I dumped a file (which has a JSON object per line), then I edited it to be an array of objects by putting open bracket at start of file, comma at end of every line (except last line), and a close bracket at the end of the file.

e.g. qs.txt

[
{ "channel": { "keywords":[ ... ] } },
{ "channel": { "keywords":[ ... ] } },
...
{ "channel": { "keywords":[ ... ] } }
]

Then I could do this:

$ cat qs.txt | underscore select ".channel .keywords" | underscore flatten --outfmt text
Alcoa operating income margin
Alcoa dps
Alcoa revenue

I could probably script this coversion so that I wouldn't have to use it manually, but if underscore had a "stream" processing mode where it treated each line as a JSON object/file that would be better.

rgarcia commented 11 years ago

+1 as well. If parsing is the biggest hurdle, JSONStream supports input of either form.

dreadjr commented 11 years ago

+1 as well, JSON log file formats, line delimited.

tonycpsu commented 11 years ago

+1. Using jsonpp for now, but would be much happier using underscore if it supported this.

ami-navon commented 10 years ago

+1 as well!

redirect11 commented 9 years ago

+1 for this one!

ddopson commented 9 years ago

It just needs to be coded ;-).

This is my #1 priority should I find time to dedicate to this tool - I work almost exclusively in C++ these days, so underscore-cli is a hobby and would exclusively live in my nights and weekends.

I'd also me more than happy to review a Pull Request. :-)

On Wed, Nov 12, 2014 at 2:49 PM, redirect11 notifications@github.com wrote:

+1 for this one!

— Reply to this email directly or view it on GitHub https://github.com/ddopson/underscore-cli/issues/2#issuecomment-62810068 .

redirect11 commented 9 years ago

I don't have to much time.... but... i found a tricky way to achive this:

readfile.js

var fs = require('fs'),
    readline = require('readline');

var rd = readline.createInterface({
    input: fs.createReadStream('./logs/all-logs.log'),
    output: process.stdout,
    terminal: false
});

console.log('[');
rd.on('line', function(line) {
    console.log(line + ',');
});

rd.on('close', function(line) {
    console.log(']');
});

doing this:

node readfile | underscore pretty

I can read the json and pipe it (inspired by @jduprey).

I don't know how much effort do you need to implements a more elegant solution based on this one. If you can give me some hint (for example the part of the code that handle the file read and processing), I will try to integrate something like this...