Closed antonelepfl closed 4 years ago
hi Stefano,
not really a Piz Daint expert, but please have a look at the file bsssubmit* in the working directory folder /scratch/snx3000/unicore/FILESPACE/9a2ed0d2-6d74-4c72-87d3-26801e073094
If this looks OK, a full output of "env" in both cases (UNICORE vs ssh&salloc) might help to track down why the UNICORE launch is not doing as expected.
Sam from prgenv-rt@cscs.ch
replied:
note that after salloc, you still need to submit your script with srun in order to have it executed on the compute nodes. I don't think however that your script will fail on when submitting it that way on the compute nodes. I think the error you see is rather related to the way unicore submits jobs, and that this leads to the situation that it cannot access Daint's environment properly. I will come back to you as soon as I have more insights.
So they are working on it
A new update on this:
it looks to me like unicore erases for some reason the default content of the MODULEPATH environment variable instead of prepending to it. Thus, the modules
daint-mc
andPyExtensions
cannot be found. Can you print the MODULEPATH variable when submitting a job from unicore and report what you get?
Which I replied:
$echo $MODULEPATH
outputs/opt/cray/ari/modulefiles:/opt/cray/pe/craype/default/modulefiles:/opt/cray/pe/modulefiles:/opt/cray/modulefiles:/opt/modulefiles
I have the same issue. When I submit with sbatch bsssubmit* everything works perfectly though.
@clupascu could you please check if the modules are loaded correctly now?
@antonelepfl Seems everything works perfectly now. I think we can close this issue. Do you know what caused this problem? Just to remind it for the future...
Thank you Carmen for checking! Not very sure what was the solution. I asked Fabio and I'll be posting some details if he provides them to me
Fabio replied with the changes that he did:
I launched some job with this config
Where
input.sh
But I get
But if I access the machine and allocate some resources and run like
I don't get those error and it works fine.
For example in this folder:
/scratch/snx3000/unicore/FILESPACE/9a2ed0d2-6d74-4c72-87d3-26801e073094