Need more details on the jobs

rwest commented 5 years ago

@ehermes just posted via the sandia gitlab thing:

Can you specify more details about the jobs? What level of theory, what basis set, etc...

nateharms commented 5 years ago

M06-2X/6-311+g(2df,2p) or better please

nateharms commented 5 years ago

Or the 6-311+g** basis set if the 6-311+g(2df,2p) basis set is unavailable

ehermes commented 5 years ago

I'll try out a few of these calculations to benchmark, but compared to the 6480 saddle point searches I'm already running, your tests are using a more expensive level of theory (I'm using B3LYP/6-31G) and have some larger molecules (my biggest molecules have 8 heavy atoms). There's a chance that your tests will scale better on KNL though, since they're bigger.

Are all of these tests closed-shell? If not, how can I tell which are open shell and which are closed-shell?

rwest commented 5 years ago

I believe they're all H-abstraction, so will all have a radical involved.

ehermes commented 5 years ago

Well, I found at least one geometry where NWChem said a multiplicity of 2 was invalid. I suppose I can just count the number of hydrogens and determine the multiplicity that way.

ehermes commented 5 years ago

Since all of the reactions are H-abstractions, does that mean systems with an even number of electrons should have a multiplicity of 3?

rwest commented 5 years ago

@nateharms are some of them abstracting H from a radical, and thus have an even number of electrons overall?

(And did you include abstraction by triplet O₂?)

nateharms commented 5 years ago

all reactions are H-Abstraction
some reactions are H-abstractions involving two radicals (e.g. OOH abstracting from [CH2]CC or something)
some reactions involve abstraction by triplet O2

ehermes commented 5 years ago

Can you give me some advice on how to determine which multiplicity to use for which geometries?

nateharms commented 5 years ago

Would it be easier if I provided a text file containing a dictionary of files and their corresponding multiplicity?

ehermes commented 5 years ago

If you can provide the multiplicities for each geometry, that would certainly be helpful.

nateharms commented 5 years ago

Okay, a text file has been added with corresponding file names and their multiplicities

ehermes commented 5 years ago

These calculations are going to take a lot of time. Unfortunately, NWChem's DFT routines (LCAO, not plane waves) are not at all optimized for many-core systems like Theta. Since the calculations in my test set are all fairly inexpensive, this wasn't a big problem, but testing M06-2X/6-311++G** I am seeing calculations take 10 minutes per single point on 2 nodes (128 cores). It will be difficult to do a meaningful amount of work in the span of a single job, given the very low time limits for jobs on Theta.

My priority currently is getting my own test set completed. Once they are done, I can start working on these systems. I think it will be a lot easier to first optimize the saddle points using a less expensive level of theory, then re-optimize at M06-2X/6-311++G** -- possibly on a different cluster.

nateharms commented 5 years ago

Gotcha, how about run M06-2X/6-31G? Will that be less expensive? Either that or run it whatever you think seems suitable, we can re-optimize them later. Thanks for the help!

ehermes commented 5 years ago

I'll test a couple of different things to see if I can find something reasonably fast that is close to your desired settings. M06-2X being a meta-GGA also adds a nontrivial cost, particularly in terms of SCF convergence (Truhlar's functionals have notorious convergence difficulties).

ehermes commented 5 years ago

I've started running these calculations now. I'm going to start by optimizing the saddle points with B3LYP/6-31G, then refining with M06-2X/6-311+G**.

I noticed there is a med directory as well, but these structure multiplicities are not in mults.txt. Do you want me to run these structures as well? If so, can you provide their multiplicities?

nateharms commented 5 years ago

@ehermes an updated mults.py file exists containing the multiplicities for all the files

ehermes commented 5 years ago

I'm having trouble parsing this file automatically. It seems like there's a strange character on line 2859.

nateharms commented 5 years ago

Hmmm... I'm not sure why it isn't working... I'm able to read in the file easily. But I'll try writing a new file and seeing if that helps

nateharms commented 5 years ago

@ehermes a file called mults.csv file was added. I didn't have any trouble reading it in with pandas. Hope this helps!

rwest commented 5 years ago

Some seem to be missing:

import pandas
import os
df = pandas.read_csv('mults.csv')
for dirpath, dirnames, filenames in os.walk('.'):
    if './' not in dirpath or '.git' in dirpath: continue
    _,d = dirpath.split('/')
    for f in filenames:
        if '.xyz' not in f:
            continue
        p = os.path.join(d,f)
        found = (sum(df.file_name==p))
        if found !=1:
            print(p, found)

high/C=CC=C+[O]O_C=[C]C=C+OO.xyz 0
high/[O]O+[CH2]CCC_CCCC+[O][O].xyz 0
low/C=CCC+[CH]=C_C=[C]CC+C=C.xyz 0
low/[O]OC=O+CC(C)(C)O_O=COO+CC(C)(C)[O].xyz 0
low/CO+CCCCO[O]_[CH2]O+CCCCOO.xyz 0
low/CCCO[O]+C=CC_CCCOO+[CH]=CC.xyz 0
low/CC(C)O[O]+CC=O_CC(C)OO+C[C]=O.xyz 0
low/C[C]=O+CCC(C)O_CC=O+C[CH]C(C)O.xyz 0
low/C#CC+[O][O]_C#C[CH2]+[O]O.xyz 0
low/[CH2]O+CCCC_CO+C[CH]CC.xyz 0
low/[CH3]+C=C=O_C+[CH]=C=O.xyz 0
med/[CH3]+CC(C)CC(=O)CC(C)C_[CH2]C(C)CC(=O)CC(C)C+C.xyz 0
med/[OH]+CCCC(C)CO_CCCC(C)[CH]O+O.xyz 0
med/[H]+CCCC=CC(C)C_C[CH]CC=CC(C)C+[H][H].xyz 0
med/[H]+CCC(C)CC(C)C_[H][H]+CCC(C)[CH]C(C)C.xyz 0
med/[H]+CCCCCC1CCC(C)O1_CCCCCC1C[CH]C(C)O1+[H][H].xyz 0

nateharms commented 5 years ago

@rwest thanks for the catch, should be good now 👍

comocheng / nwchem_to_run

Need more details on the jobs #1