dflemstr / rq

Record Query - A tool for doing record analysis and transformation
Apache License 2.0
2.28k stars 57 forks source link

Skip invalid records #148

Open killercup opened 7 years ago

killercup commented 7 years ago

Is there an option, or the plan to add an option, to skip invalid records in a file?

Let's say I want to pipe the output of cargo rustc --quiet -- --error-format json to rq to filter and operate on the error messages. Sadly, even with --quiet, the last few lines will be something like error: Could not compile `foo`. […] To learn more, run the command again with --verbose.. This means I need to save the output to a file and manually edit it before I can use it with rg.

(By the way, I noticed this while generating the list for https://github.com/diesel-rs/diesel/issues/563.)

dflemstr commented 7 years ago

Cool that you're using rq for error analysis!

How do you think the error recovery should work? Let's say the error message looks like this:

error: Could not compile "foo" because true != false.

Should we parse "foo" as a JSON string? Should we parse true and false as JSON booleans?

For the record, for fun I tried to produce the same output as you had, and this worked for me (with a clean diesel checkout; line-wrapped for readability). If we can live with the assumption that the output is always JSON objects, grep does the job:

rustup run nightly
  cargo rustc
    --features 'postgres sqlite chrono unstable'
  --
    -D missing_docs
    --error-format json 2>&1 |
  grep '^{' |
  rq -jH '
    filter [level, error] |
    flatMap spans |
    map (s) => { {file: `${s.file_name}:${s.line_start}`, text: s.text[0].text.trim()} } |
    uniqBy file |
    map (x) => { `${x.file} ${x.text}` }'
killercup commented 7 years ago

Heh, using grep '^{' is pretty clever! I'll be doing that!

How do you think the error recovery should work?

I would not "recover" anything :) I haven't looked into rg's code, but I assume you have an iterator somewhere that you use to read multiple records per input stream (like one JSON document per lines). I was thinking of something like --skip-lines-with-invalid-records (or maybe --yolo?) that drops the current record as soon as an error occurs, skips to the end of the line (just a convention) and tries to read the next one. I can see how this might be difficult to implement for all cases; I'm just looking for trivial cases.

dflemstr commented 7 years ago

Ah okay. Right now rq doesn't act on lines, but on actual record sequences, so you can have an input like {}{}{}{}{} if you want!

I'm not convinced it's worth it to add this as an option but if more people request it I'll consider it for sure!

killercup commented 7 years ago

Sounds good. Thanks for your quick replies, the grep, and the far nicer rq code! :)

assafmo commented 6 years ago

This feature will be great!