I was getting a strange error when submitting my jobs:
env: 02: No such file or directory
I tracked it down to /opt/apps/launcher/3.0.1/pass_env, which (to my understanding) assembles a string of variable declarations that is passed to the ssh call on line 204 of /opt/apps/launcher/3.0.1/paramrun.
The problem is when the environment contains variables with values containing blanks, e.g.:
Note the missing quotes in the declaration of the MONTH_LIST variable. It gets the value "01" assigned and then "02 03..." is left dangling, causing the error reported above.
In case the AcceptEnv option in sshd_config has not been considered, I suggest that as a possible solution. Even more convoluted solutions such as replacing line 204 of /opt/apps/launcher/3.0.1/paramrun with:
seem a more robust option in comparison to fondling with multiple grep calls pipped from env.
On my side, I explicitly nullify these problematic variables in the env at the time the job is submitted, avoiding the problem, but I would assume this sort of issue might pop up eventually to someone else.
From TACC support ticket 39538:
Hello,
I was getting a strange error when submitting my jobs:
env: 02: No such file or directory
I tracked it down to /opt/apps/launcher/3.0.1/pass_env, which (to my understanding) assembles a string of variable declarations that is passed to the ssh call on line 204 of /opt/apps/launcher/3.0.1/paramrun.
The problem is when the environment contains variables with values containing blanks, e.g.:
export MONTH_LIST="01 02 03 04 05 06 07 08 09 10 11 12"
This will cause the variable MONTH_LIST to be expanded by /opt/apps/launcher/3.0.1/pass_env to (in bold):
... TACC_ICC_LIB=/opt/apps/intel/16.0.1.150/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64 MONTH_LIST=01 02 03 04 05 06 07 08 09 10 11 12 SLURM_CPUS_ON_NODE=48 ...
Note the missing quotes in the declaration of the MONTH_LIST variable. It gets the value "01" assigned and then "02 03..." is left dangling, causing the error reported above.
In case the AcceptEnv option in sshd_config has not been considered, I suggest that as a possible solution. Even more convoluted solutions such as replacing line 204 of /opt/apps/launcher/3.0.1/paramrun with:
SSHENV=/tmp/$RANDOM env > $SSHENV echo "LAUNCHER_NHOSTS=$np" >> $SSHENV echo "LAUNCHER_HOST_ID=$i" >> $SSHENV scp $SSHENV $host:~/.ssh/environment ssh $host "cd $LAUNCHER_WORKDIR; $LAUNCHER_DIR/init_launcher" & rm -f $SSHENV
seem a more robust option in comparison to fondling with multiple grep calls pipped from env.
On my side, I explicitly nullify these problematic variables in the env at the time the job is submitted, avoiding the problem, but I would assume this sort of issue might pop up eventually to someone else.
Best regards.
João Encarnação