ales-t / rjp

Rapid JSON-lines processor
Apache License 2.0
3 stars 0 forks source link

rjp: Rapid JSON-lines processor

A fast and simple command-line tool for common operations over JSON-lines files, such as:

You could use jq for some of these tasks (and in fact, jq is a far more general tool) but:

This is my attempt to learn a bit of Rust, don't take this tool too seriously. That being said, it is pretty quick and handy, at least for me.

Build & Installation

Get rust:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

Clone and build rjp:

git clone https://github.com/ales-t/rjp.git
cd rjp
cargo build --release

You will find the binary in target/release/rjp. You can add it to your PATH e.g. like this:

export PATH="$(pwd)/target/release:$PATH"

Basic usage

rjp < input_file [INPUT_CONVERSION] [PROCESSOR [PROCESSOR...]] [OUTPUT_CONVERSION] > output_file

rjp runs a chain of processors on each instance in the input stream (STDIN), finally printing the processed instances to STDOUT.

Input conversions

By default, rjp reads the input file as JSON lines. You can optionally specify a file conversion as the first positional argument.

TSV

Convert TSV lines with specified field names.

Aliases: tsv_to_json, from_tsv

Examples:

Plain text

Conversion from TXT treats the whole input line as a single string field, you need to specify its name.

Aliases: txt_to_json, from_txt

Examples:

Processors

The following processors are implemented (brackets list shorthand aliases):

Add fields

Add new fields with constant values.

Aliases: add_fields, af, add

Examples:

Drop fields

Remove existing fields.

Aliases: drop_fields, df, drop

Examples:

Extract items

Extract items from arrays and objects.

Aliases: extract_items e, extract

Examples:

Join

Perform inner join with another input stream (with optional file conversion).

Note on performance: while the main stream is processed line-by-line, the stream to join is loaded in RAM (i.e. use the smaller file as the joined stream).

Aliases: join, j, inner_join

Examples:

Left join

Identical to join, except that lines from the main stream that don't have a corresponding instance in the joined stream are kept (and no additional fields are added to them).

Aliases: lj, left_join

Merge

Merge with another input stream line-by-line, with optional file conversion.

Aliases: merge, mrg

Examples:

Rename fields

Rename fields in instances.

Aliases: rename, rnm

Examples:

Select fields

Select a subset of fields (the rest are dropped).

Aliases: select_fields, sf, select, sel

Examples:

To number

Convert a string field to a numeric one.

Aliases: to_number, num

Examples:

Output conversions

By default, rjp will produce JSON lines. You can change that with a file conversion.

TSV

Convert into TSV liens with specified fields.

Aliases: to_tsv, json_to_tsv, tsv

Examples: