anki-code / xontrib-pipeliner

Let your pipe lines flow thru the Python code in xonsh.
BSD 2-Clause "Simplified" License
56 stars 5 forks source link
data-engineering data-engineering-pipeline pipe pipeline pipelines python shell xonsh xontrib

Easily process the lines using pipes in xonsh shell. Multicore processing supported.

If you like the idea of pipeliner click ⭐ on the repo and tweet.

Install

xpip install -U xontrib-pipeliner
echo 'xontrib load pipeliner' >> ~/.xonshrc
# Reload xonsh

Usage

Let your pipe lines flow thru the Python code:

<cmd> | <...> | pl "<preset name or lambda expression>" | <cmd> | <...>

Experimental:

Examples

Presets

pl  # list of presets

echo "  1" | pl strip
# 1

echo "1,2,3" | pl split ,
['1', '2', '3']

echo "a,b,c" | pl split , | pl fromlist 0
# a

echo xonsh pids is $(ps ax | grep xonsh | grep -v grep | pl split ' ' | pl fromlist 0)
# xonsh pids is 56486 56913 56489

You can set your own presets:

$XONTRIB_PIPELINER_PRESETS = {
    "upper": "line.upper()",
    "repeat": lambda line, num, args: line * int(args[0])
}

echo 'hello' | pl upper
# HELLO

echo 'hey \nhi ' | pl repeat 3
# hey hey hey
# hi hi hi

Lambda string

There are two variables available in lambda expression:

Python way to line modification

ls -1 / | pl "line + ' is here'" | head -n 3
bin is here
boot is here
dev is here

Line number

ls -1 / | head -n 4 | pl "f'{num} {line}'"
0 bin
1 boot
2 cdrom
3 dev

Ignore line

$ ls -1 / | head -n 4 | pl "f'{num} {line}' if num%2 == 0 else None"
0 bin
2 cdrom

Splitting

cat /etc/passwd | head -n 3 | pl "line.split(':')[6]"
/bin/bash
/usr/sbin/nologin
/usr/sbin/nologin

Imports

import re
cat /etc/passwd | head -n 3 | pl "re.sub('/bin/bash', '/usr/bin/xonsh', line)"
root:x:0:0:root:/root:/usr/bin/xonsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin

Arrays

cat /etc/passwd | head -n 3 | pl "line.split(':')" | grep nologin | pl "':'.join(eval(line)[::-1])"
/usr/sbin/nologin:/usr/sbin:daemon:1:1:x:daemon
/usr/sbin/nologin:/bin:bin:2:2:x:bin

Python head

pl "'\\n'.join(list('ABCDEFG'))" | pl "line + ('!' if num%2 else '?')" | grep '!'
B!
D!
F!

Variables and operations chaining

Expression is a lambda function so using variables and operations chaining since Python 3.8+ are available by trick with the walrus operator and the list:

ls -1 / | head -n3 | pl "[s:='b', line.replace(s, s.upper()+')')][-1]"
B)in
B)oot
dev

Execute command with the line

ls / | head -n 3 | pl "execx('du -sh /'+line) or 'Done command with /'+line"
0       /bin
Done command with /bin
840M    /boot
Done command with /boot
4,0K    /cdrom
Done command with /cdrom

Note! If you do the operations with files (i.e. pl "execx(f'mv {line} prefix-{line}')") you could catch TypeError: an integer is required error that relates to wrong access rights to files. Fix it with chmod and chown before pipelining.

Wrap pipeliner to get your own magic

aliases['my_lovely_pl'] = lambda a,i,o: aliases['pl'](["'My lovely ' + "+a[0]], i, o)
aliases['my_parallel_ppl'] = lambda a,i,o: aliases['ppl'](["'My parallel ' + "+a[0]], i, o)
ls / | head -n 3 | my_lovely_pl "line + '!'"
# My lovely bin!
# My lovely boot!
# My lovely cdrom!

ls / | head -n 3 | my_parallel_ppl "line + '!'"
# My parallel boot!
# My parallel cdrom!
# My parallel bin!

Add your most useful solutions to xontrib-pipeliner. PRs are welcome!

Experimental

Syntax highlighting using xonsh prompt

If you're using xonsh prompt and want to use pipeliner with syntax highlighting instead of string there is experimental feature that catch pl @(<python>) calls and uses the expression from the xonsh python substitution as pipeliner argument. Example:

echo echo | pl @(line + '!')
# In the xonsh prompt it's equals to:
echo echo | pl "line + '!'" 

Syntax highlighting using xonsh macros

To avoid writing Python inside the string and get the syntax highlighting there is a tricky way with using xonsh macro:

def py(code):
    return code

echo 123 | pl @(py!(line + '2'))

Multicore pipelining

By default pipeliner works using one CPU core. To use them all in parallel try ppl command:

head /etc/passwd | ppl "str(num) + ' ' + line.split(':')[0]"
1 daemon
0 root
2 bin
4 sync
5 games
8 mail
9 news
6 man
7 lp
3 sys

Note! The order of result lines is unpredictable because lines will be processed in parallel. The num variable contains the real line number.

Pipeliner exec

There are plx and pplx commands to run execx(f"{plx_command}") most shorter way.

For example when you want to rename files you can do it Pythonic way:

mkdir -p /tmp/plx-test && cd /tmp/plx-test
touch 111 222 333 && ls
# 111 222 333

ls | plx "mv {line} prefix-{line}"
# mv 111 prefix-111
# mv 222 prefix-222
# mv 333 prefix-333

ls
# prefix-111 prefix-222 prefix-333

Echo example:

ls | plx 'echo {line} # {num}'
# echo prefix-111 # 0
# prefix-111
# echo prefix-222 # 1
# prefix-222
# echo prefix-333 # 2
# prefix-333

Pipeliner in xsh scripts

By default xsh scripts haven't rc-file with xontribs loading. To add pipeliner to your script just do xontrib load pipeliner before usage.

Known issues in experimental functions

plx: "Bad file descriptor" on huge amount of lines

https://github.com/xonsh/xonsh/issues/4224

ppl: On MacOS global variables are not accessible from child processes in multicore pipelining

On Mac you can't access to the xonsh context (global variables and functions) in the expression. PR is welcome!

ppl: On MacOS multicore pipelining freezes on end

Workaround is to add cat at the end: echo 1 | ppl 'line' | cat. PR is welcome!

Future

Pipeliner should be a part of xonsh and has shortcut and syntax highlighting. For example:

echo 'Pipeliner should be ' | pl @{line + 'a part of xonsh!'}
# or
echo 'Pipeliner should be ' | ~(line + 'a part of xonsh!')
Pipeliner should be a part of xonsh!

If you want to support this in xonsh add your Like and support message to Python code substitution in subproc mode.

Links