TACC / launcher

A simple utility for executing multiple sequential or multi-threaded applications in a single multi-node batch job
MIT License
63 stars 33 forks source link

Environment passing still fails when variables contain spaces #43

Open lwilson opened 6 years ago

lwilson commented 6 years ago

From TACC support ticket 39538:

Hello,

I was getting a strange error when submitting my jobs:

env: 02: No such file or directory

I tracked it down to /opt/apps/launcher/3.0.1/pass_env, which (to my understanding) assembles a string of variable declarations that is passed to the ssh call on line 204 of /opt/apps/launcher/3.0.1/paramrun.

The problem is when the environment contains variables with values containing blanks, e.g.:

export MONTH_LIST="01 02 03 04 05 06 07 08 09 10 11 12"

This will cause the variable MONTH_LIST to be expanded by /opt/apps/launcher/3.0.1/pass_env to (in bold):

... TACC_ICC_LIB=/opt/apps/intel/16.0.1.150/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64 MONTH_LIST=01 02 03 04 05 06 07 08 09 10 11 12 SLURM_CPUS_ON_NODE=48 ...

Note the missing quotes in the declaration of the MONTH_LIST variable. It gets the value "01" assigned and then "02 03..." is left dangling, causing the error reported above.

In case the AcceptEnv option in sshd_config has not been considered, I suggest that as a possible solution. Even more convoluted solutions such as replacing line 204 of /opt/apps/launcher/3.0.1/paramrun with:

SSHENV=/tmp/$RANDOM env > $SSHENV echo "LAUNCHER_NHOSTS=$np" >> $SSHENV echo "LAUNCHER_HOST_ID=$i" >> $SSHENV scp $SSHENV $host:~/.ssh/environment ssh $host "cd $LAUNCHER_WORKDIR; $LAUNCHER_DIR/init_launcher" & rm -f $SSHENV

seem a more robust option in comparison to fondling with multiple grep calls pipped from env.

On my side, I explicitly nullify these problematic variables in the env at the time the job is submitted, avoiding the problem, but I would assume this sort of issue might pop up eventually to someone else.

Best regards.

João Encarnação