Goodman-lab / DP5

Python workflow for DP5 and DP4 analysis of organic molecules
Other
173 stars 99 forks source link

Timeouts #60

Open flatstik opened 2 years ago

flatstik commented 2 years ago

I keep getting these kind of errors with even the not-so-complex molecules:

 File "/scratch/project_2003067/DP5/DP5/PyDP4.py", line 841, in <module>
    main(settings)
  File "/scratch/project_2003067/DP5/DP5/PyDP4.py", line 317, in main
    Isomers = DFT.RunOptCalcs(Isomers, settings)
  File "/scratch/project_2003067/DP5/DP5/Gaussian.py", line 254, in RunOptCalcs
    Completed = RunCalcs(GausJobs, settings)
  File "/scratch/project_2003067/DP5/DP5/Gaussian.py", line 308, in RunCalcs
    outp = subprocess.check_output(GausPrefix + " < "  + f + ' > ' + f[:-3] + 'out', shell=True)
  File "/scratch/project_2003067/DP5/DP5_env/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/scratch/project_2003067/DP5/DP5_env/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/appl/soft/chem/gaussian/G16RevC.01_new/g16/g16 < Smiles_Mol_2_0ginp008.com > Smiles_Mol_2_0ginp008.out' returned non-zero exit status 1.

Is there any way to increase the timeout @HowarthA ?

HowarthA commented 2 years ago

How long does the gaussian job run for prior to this error being generated, does the job run at all? Can you run this gaussian job outside of DP5 on the same machine without any similar issues?

flatstik commented 2 years ago

something like 30-180 minutes - I haven't been waiting. I can run gaussian job on the same machine w/o any issues

HowarthA commented 2 years ago

ok sure, I'll have a look into this, its not a problem I've ever come across. If you rerun the calculation DP5 should pick up the geometry optimisation from where it was left after the timeout.

HowarthA commented 2 years ago

I've made a change that may fix this, give it a try

flatstik commented 2 years ago

I've made a change that may fix this, give it a try

Still the same issue, but it did finish the first calculation. timeout= 86400:

/appl/soft/chem/gaussian/G16RevC.01_new/g16/g16 < Smiles_Mol_0_1ginp001.com > Smiles_Mol_0_1ginp001.out
Gaussian job 1 of 318 completed.
/appl/soft/chem/gaussian/G16RevC.01_new/g16/g16 < Smiles_Mol_0_1ginp002.com > Smiles_Mol_0_1ginp002.out
Traceback (most recent call last):
  File "/scratch/project_2003067/DP5/DP5/PyDP4.py", line 841, in <module>
    main(settings)
  File "/scratch/project_2003067/DP5/DP5/PyDP4.py", line 317, in main
    Isomers = DFT.RunOptCalcs(Isomers, settings)
  File "/scratch/project_2003067/DP5/DP5/Gaussian.py", line 254, in RunOptCalcs
    Completed = RunCalcs(GausJobs, settings)
  File "/scratch/project_2003067/DP5/DP5/Gaussian.py", line 308, in RunCalcs
    outp = subprocess.check_output(GausPrefix + " < "  + f + ' > ' + f[:-3] + 'out', shell=True,timeout= 86400)
  File "/scratch/project_2003067/DP5/DP5_env/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/scratch/project_2003067/DP5/DP5_env/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/appl/soft/chem/gaussian/G16RevC.01_new/g16/g16 < Smiles_Mol_0_1ginp002.com > Smiles_Mol_0_1ginp002.out' returned non-zero exit status 1.
flatstik commented 2 years ago

Update: It only hangs up with -gnomesw but not with -gnmesw

flatstik commented 2 years ago

And after two weeks of preparing all the isomers for two smiles strings, even the -gnmesw hangs due to memory limit (and I cannot increase it more than that):

Reading experimental NMR data...
[PosixPath('Proton'), PosixPath('Carbon')]
Processing Proton Spectrum
slurmstepd: error: StepId=10077369.0 exceeded memory limit (67121145856 > 67108864000), being killed
srun: Exceeded job memory limit
slurmstepd: error: *** STEP 10077369.0 ON r07c52 CANCELLED AT 2022-01-08T02:02:11 ***
slurmstepd: error: StepId=10077369.0 exceeded memory limit (67121145856 > 67108864000), being killed
srun: Exceeded job memory limit
srun: Exceeded job memory limit
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: r07c52: task 0: Killed
srun: launch/slurm: _step_signal: Terminating StepId=10077369.0