Open sbrozell opened 3 years ago
That's because jobs submitted from web interface are submitted from web node at OSC and jobs submitted from SSH are submitted from a login node. They will not and can not have the same environment, for example web nodes do not need nor will they have Lmod installed because it's not needed to submit jobs, it's needed to run jobs.
All jobs whatever their submission source should have the same runtime environment, and user jobs coming from ondemand do not have the same runtime environment and are failing.
Ah yes, runtime environment should be the same as long as the job is submitted with --export=NONE
in SLURM. The environment at submit time will differ but the SLURM job startup environment will be the same. The only way to guarantee the same environment with SLURM is to ignore the submit environment, ie use --export=NONE
. The default behavior with SLURM is to take the submit environment and apply it to the job environment, ie --export=ALL
.
Any update on this issue? I'd like to close some old incidents.
FWIW, there is a recent ticket with divergent ondemand and ssh batch job behavior INC0369463.
Thanks for reminding me of this ticket. We patched 2.0 with the copy_environment
flag. Users can check that flag and then we will submit the job with EXPORT=ALL
and within their job they should be able to effectively use srun
or similar.
I have requested that the user try that and report the results.
I am resolving INC0369463 since it's been a month w/o user response.
I am resolving INC0357151 since the user had workarounds.
Jobs from the job composer do not seem to have the same environment as jobs submitted from the command line in an ssh session. The two most recent tickets are INC0357151 and INC0356547. There is also an asana task: https://app.asana.com/0/1166442278779601/1200350097179233/f
┆Issue is synchronized with this Asana task by Unito