Closed yaomz16 closed 3 years ago
Can you fill in the "Performance Issue" info? (Here in this issue)
Your Name: Your Andrew ID: Node(s) on which the problem occurred: Expected Behavior: Observed Behavior: Location of Log file Showing the Error: Location of Script showing [Minimum Working Example]:
Sure!
This looks like the nfs issue again. Try it again and see if it works
Add -w c002
I posted a new issue just now
This looks like the nfs issue again. Try it again and see if it works
Nope, my job still fails because of the same reason
Consolidating Info
Your Name: "Archie" Mingze Yao Your Andrew ID: mingzeya Node(s) on which the problem occurred: c002 Expected Behavior: Job running normally Observed Behavior: Failed at once Location of Log file Showing the Error:/home/mingzeya/Phase_field_project/multi_geometries/python_impl/all_constants_changed/case1/restart_at_epoch_45/restart_at_epoch_60/restart_at_epoch_115/restart_at_epoch_165/error_2997814.err Location of Script showing [Minimum Working Example]:/home/mingzeya/Phase_field_project/multi_geometries/python_impl/all_constants_changed/case1/restart_at_epoch_45/restart_at_epoch_60/restart_at_epoch_115/restart_at_epoch_165/train.py
Please also attach any logs and the submission script to this issue.
I get the following error in the StdErr file: "slurmstepd: error: execve(): /tmp/slurmd/job2997814/slurm_script: No such file or directory"
If you are not a frequent github user, please also provide us with a contact email here: Contact Email: amyao@cmu.edu
FWIW, I'm getting the same error when submitting a CPU job
Your Name: Emil Your Andrew ID: eannevel Node(s) on which the problem occurred: f001 Expected Behavior: normal run Observed Behavior: exited within 10 seconds Location of Log file Showing the Error: /home/eannevel/ARPA-E/slabmol/logs/error.2997816 Location of Script showing [Minimum Working Example]: no working example
Where are your sbatch scripts? @yaomz16 @emilannevelink
can reproduce. Interactive jobs seem fine though
My script is at /home/eannevel/ARPA-E/slabmol/scripts/mol_gpaw_MD.sh
My script is at /home/mingzeya/Phase_field_project/multi_geometries/python_impl/all_constants_changed/case1/restart_at_epoch_45/restart_at_epoch_60/restart_at_epoch_115/restart_at_epoch_165/TrainGPU.sh
This should be fixed
I submitted a gpu job successfully, but the job failed at once, and I get this error message in StdErr file: slurmstepd: error: execve(): /tmp/slurmd/job2997813/slurm_script: No such file or directory