Open rwest opened 5 years ago
M06-2X/6-311+g(2df,2p) or better please
Or the 6-311+g** basis set if the 6-311+g(2df,2p) basis set is unavailable
I'll try out a few of these calculations to benchmark, but compared to the 6480 saddle point searches I'm already running, your tests are using a more expensive level of theory (I'm using B3LYP/6-31G) and have some larger molecules (my biggest molecules have 8 heavy atoms). There's a chance that your tests will scale better on KNL though, since they're bigger.
Are all of these tests closed-shell? If not, how can I tell which are open shell and which are closed-shell?
I believe they're all H-abstraction, so will all have a radical involved.
Well, I found at least one geometry where NWChem said a multiplicity of 2 was invalid. I suppose I can just count the number of hydrogens and determine the multiplicity that way.
Since all of the reactions are H-abstractions, does that mean systems with an even number of electrons should have a multiplicity of 3?
@nateharms are some of them abstracting H from a radical, and thus have an even number of electrons overall?
(And did you include abstraction by triplet O₂?)
Can you give me some advice on how to determine which multiplicity to use for which geometries?
Would it be easier if I provided a text file containing a dictionary of files and their corresponding multiplicity?
If you can provide the multiplicities for each geometry, that would certainly be helpful.
Okay, a text file has been added with corresponding file names and their multiplicities
These calculations are going to take a lot of time. Unfortunately, NWChem's DFT routines (LCAO, not plane waves) are not at all optimized for many-core systems like Theta. Since the calculations in my test set are all fairly inexpensive, this wasn't a big problem, but testing M06-2X/6-311++G** I am seeing calculations take 10 minutes per single point on 2 nodes (128 cores). It will be difficult to do a meaningful amount of work in the span of a single job, given the very low time limits for jobs on Theta.
My priority currently is getting my own test set completed. Once they are done, I can start working on these systems. I think it will be a lot easier to first optimize the saddle points using a less expensive level of theory, then re-optimize at M06-2X/6-311++G** -- possibly on a different cluster.
Gotcha, how about run M06-2X/6-31G? Will that be less expensive? Either that or run it whatever you think seems suitable, we can re-optimize them later. Thanks for the help!
I'll test a couple of different things to see if I can find something reasonably fast that is close to your desired settings. M06-2X being a meta-GGA also adds a nontrivial cost, particularly in terms of SCF convergence (Truhlar's functionals have notorious convergence difficulties).
I've started running these calculations now. I'm going to start by optimizing the saddle points with B3LYP/6-31G, then refining with M06-2X/6-311+G**.
I noticed there is a med
directory as well, but these structure multiplicities are not in mults.txt
. Do you want me to run these structures as well? If so, can you provide their multiplicities?
@ehermes an updated mults.py
file exists containing the multiplicities for all the files
I'm having trouble parsing this file automatically. It seems like there's a strange character on line 2859.
Hmmm... I'm not sure why it isn't working... I'm able to read in the file easily. But I'll try writing a new file and seeing if that helps
@ehermes a file called mults.csv
file was added. I didn't have any trouble reading it in with pandas. Hope this helps!
Some seem to be missing:
import pandas
import os
df = pandas.read_csv('mults.csv')
for dirpath, dirnames, filenames in os.walk('.'):
if './' not in dirpath or '.git' in dirpath: continue
_,d = dirpath.split('/')
for f in filenames:
if '.xyz' not in f:
continue
p = os.path.join(d,f)
found = (sum(df.file_name==p))
if found !=1:
print(p, found)
high/C=CC=C+[O]O_C=[C]C=C+OO.xyz 0
high/[O]O+[CH2]CCC_CCCC+[O][O].xyz 0
low/C=CCC+[CH]=C_C=[C]CC+C=C.xyz 0
low/[O]OC=O+CC(C)(C)O_O=COO+CC(C)(C)[O].xyz 0
low/CO+CCCCO[O]_[CH2]O+CCCCOO.xyz 0
low/CCCO[O]+C=CC_CCCOO+[CH]=CC.xyz 0
low/CC(C)O[O]+CC=O_CC(C)OO+C[C]=O.xyz 0
low/C[C]=O+CCC(C)O_CC=O+C[CH]C(C)O.xyz 0
low/C#CC+[O][O]_C#C[CH2]+[O]O.xyz 0
low/[CH2]O+CCCC_CO+C[CH]CC.xyz 0
low/[CH3]+C=C=O_C+[CH]=C=O.xyz 0
med/[CH3]+CC(C)CC(=O)CC(C)C_[CH2]C(C)CC(=O)CC(C)C+C.xyz 0
med/[OH]+CCCC(C)CO_CCCC(C)[CH]O+O.xyz 0
med/[H]+CCCC=CC(C)C_C[CH]CC=CC(C)C+[H][H].xyz 0
med/[H]+CCC(C)CC(C)C_[H][H]+CCC(C)[CH]C(C)C.xyz 0
med/[H]+CCCCCC1CCC(C)O1_CCCCCC1C[CH]C(C)O1+[H][H].xyz 0
@rwest thanks for the catch, should be good now 👍
@ehermes just posted via the sandia gitlab thing: