ReactionMechanismGenerator / RMG-Java

The Java version of RMG: Reaction Mechanism Generator
http://rmg.sourceforge.net/
MIT License
29 stars 36 forks source link

Gaussian freezes job when running in QM mode #150

Open rwest opened 13 years ago

rwest commented 13 years ago

A couple of my jobs made no progress over the weekend. On logging in to the compute node where they were running and typing top, I discover that the gaussian program l103.exe has been running ~5000 minutes, without progress. In this scenario it would be good for RMG to just kill the process and carry on with the next attempt.

top - 15:28:31 up 159 days,  3:50,  1 user,  load average: 2.25, 2.17, 2.06
Tasks: 180 total,   3 running, 177 sleeping,   0 stopped,   0 zombie
Cpu(s): 28.2%us,  0.0%sy,  0.0%ni, 71.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16465776k total, 13075504k used,  3390272k free,   342396k buffers
Swap: 11999224k total,     3140k used, 11996084k free,  9402620k cached
PID to kill: 7279
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND             
 7279 rwest     20   0  139m  53m 1044 R  100  0.3   5015:43 l103.exe             
17585 rwest     20   0  139m  53m 1044 R  100  0.3   4996:03 l103.exe             
 9839 rwest     20   0 2956m 1.3g 8908 S    0  8.1  75:18.92 java           
rwest commented 13 years ago

Here are the input files to two jobs that have been frozen in l103.exe for ~15 hours now. Notice they are the same molecule.

rwest@node47:/tmp/60336.1.long2/QMfiles$ cat ASIRPOSRZLOBKS-UHFFFAOYAJ.gjf
%chk=/tmp/60336.1.long2/QMfiles/RMGrunCHKfile.chk
%mem=6MW
%nproc=1
# pm3 opt=(tight,nolinear,calcfc,small,maxcyc=200) freq IOP(2/16=3)

 InChI=1/C2O4/c3-1-2(3,4-1)6-5-1

0  1
O          -0.91470        -1.09320         0.11030
C          -0.28610        -0.01190        -0.75400
O           1.17360        -0.11080        -0.65280
C          -0.31180         0.04870         0.62450
O          -0.75130         1.21630         0.01190
O           1.09030        -0.04900         0.66010
rwest@node47:/tmp/60367.1.long2/QMfiles$ cat ASIRPOSRZLOBKS-UHFFFAOYAJ.gjf
%chk=/tmp/60367.1.long2/QMfiles/RMGrunCHKfile.chk
%mem=6MW
%nproc=1
# pm3 opt=(tight,nolinear,calcfc,small,maxcyc=200) freq IOP(2/16=3)

 InChI=1/C2O4/c3-1-2(3,4-1)6-5-1

0  1
O          -0.91470        -1.09320         0.11030
C          -0.28610        -0.01190        -0.75400
O           1.17360        -0.11080        -0.65280
C          -0.31180         0.04870         0.62450
O          -0.75130         1.21630         0.01190
O           1.09030        -0.04900         0.66010

and here's the 'top' output:

top - 12:32:41 up 162 days, 54 min,  1 user,  load average: 2.02, 2.02, 2.00
Tasks: 180 total,   3 running, 177 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy, 27.1%ni, 72.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16465776k total,  9774520k used,  6691256k free,   172912k buffers
Swap: 11999224k total,    21428k used, 11977796k free,  5746328k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                
14680 rwest     30  10  139m  53m 1044 R  100  0.3 913:58.62 l103.exe                                                
15160 rwest     30  10  139m  53m 1044 R  100  0.3 840:22.49 l103.exe                       
gmagoon commented 13 years ago

As for this particular molecule, running on pharos with this in the condition file at 85228e998d95f27d415d9df39b3fc2016b8408cf, I don't encounter the hang on attempt #10, and attempt #14 "succeeds" but has optimized to a different structure (this is a known issue of unknown significance...I have seen it several times with wacky species with MOPAC but I think this may be the first time I've seen this occurring with Gaussian; I originally had a way of dealing with this, but I'm not sure that it is robust, so it is currently commented out.)

In any case, PM3 seems to break this up into two CO2 molecules with other keyword choices, so I suspect that this should be a forbidden structure.

For the more general issue of what to do when Gaussian hangs, consider a timeout option on my to-do list.

gmagoon commented 13 years ago

PS...here is the adjacency list I used: 1 C 0 {2,S} {3,S} {4,S} {5,S} 2 C 0 {1,S} {3,S} {4,S} {6,S} 3 O 0 {1,S} {2,S} 4 O 0 {1,S} {2,S} 5 O 0 {1,S} {6,S} 6 O 0 {2,S} {5,S}