aaronriekenberg / rust-parallel

Fast command line app in rust/tokio to run commands in parallel. Similar interface to GNU parallel or xargs plus useful features. Listed in Awesome Rust utilities.
MIT License
146 stars 7 forks source link

Cartisian product supporting named arguments #16

Closed rtbs-dev closed 4 months ago

rtbs-dev commented 6 months ago

So far working nicely for my benchmark dataset generation task. I need to sweep over some keyword/named arguments (--kwarg X), and I see GNU parallel can potentially do this via --header?

E.g. see this from my custom command random-graph-walks kind size --n-walks --n-jumps --seed, where the named arguments have default values, and I want to have rust-parallel sweep over --seeds without giving it n-walks and n-jumps input (they would stay randomized).

rust-parallel -p random-graph-walks ::: tree block ::: 5 10 15

that works just fine to loop over combinations of (kind,size). But if I want to provide --seed options, how would I specify? This doesn't work:

rust-parallel -p random-graph-walks ::: tree block ::: 5 10 15 ::: --seed 2 5 10

complaining that parameter 'seed' requires an argument ( meaning, --seed, 2,5,10 all get passed as n-walks individually)

aaronriekenberg commented 6 months ago
  1. One way to do this currently would be using regular expression capture groups to rewrite the commands - I think something like this should work (here using echo for demo):
rust-parallel --dry-run -r '(.*) (.*) (.*)' echo {1} {2} --seed {3} ::: tree block ::: 5 10 15 ::: 2 5 10

regex pattern matches each argument group and {1} {2} {3} are numbered capture groups - you can write arbitrary commands similar to above using these.

In your case I think this command should work:

rust-parallel -p  -r '(.*) (.*) (.*)' random-graph-walks {1} {2} --seed {3} ::: tree block ::: 5 10 15 ::: 2 5 10
  1. Equivalent to option 1 and maybe more readable - you could use regex named capture groups:
rust-parallel --dry-run -r '(?P<kind>.*) (?P<size>.*) (?P<seed>.*)' echo {kind} {size} --seed {seed} ::: tree block ::: 5 10 15 ::: 2 5 10
rust-parallel -p -r '(?P<kind>.*) (?P<size>.*) (?P<seed>.*)' random-graph-walks {kind} {size} --seed {seed} ::: tree block ::: 5 10 15 ::: 2 5 10
  1. Another way could be to define a bash function and process arguments in the function - see this example
wrapper_function() {
  echo "in wrapper_function args $@"
  random-graph-walks $1 $2 --seed $3
}

export -f wrapper_function

rust-parallel -p -s wrapper_function ::: tree block ::: 5 10 15 ::: 2 5 10
  1. I am not familiar with --header option you mentioned to GNU parallel - will read more on this.

thanks!

aaronriekenberg commented 5 months ago

Looks like GNU parallel can use --header option in this way for your example:

parallel --header : random-graph-walks {kind} {size} --seed {seed}  ::: kind tree block ::: size 5 10 15 ::: seed 2 5 10

From this reference, --header : lets the first argument after each ::: be a named alias that gets replaced - in this example names are kind, size, seed.

So far rust-parallel does not support --header option but can consider adding this. The options in previous comment using regex/bash function could be workarounds for now.

@rtbs-dev am I understanding your use case correctly? Thanks!

rtbs-dev commented 5 months ago

Awesome, yes! This is a super helpful pattern for parameter sweeps. I've been successfully using the regex version you posted and it definitely works nicely, though something like the header option might be very nice to simplify things.

Not sure if you're wanting feature parity, since imo the header syntax is a little weird for gnu-parallel. Open to any syntactic sugar, really, that implements argument interpolation for combinations of variables.

aaronriekenberg commented 4 months ago

In version 1.18.0 - added support for automatic numbered variable interpolation

In addition to previous options - the following will work now:

rust-parallel --dry-run echo {1} {2} --seed {3} ::: tree block ::: 5 10 15 ::: 2 5 10

May in future consider adding support for named arguments similar to --header : in GNU parallel, but starting with above only for now :)

Thanks @rtbs-dev !