RagnarB83 / ash

ASH is a Python-based computational chemistry and QM/MM environment, primarily for molecular calculations in the gas phase, explicit solution, crystal or protein environment.
GNU General Public License v2.0
57 stars 13 forks source link

Question on ASH's default convergence behavior on QM/MM jobs and broken-symmetry behavior #429

Closed Lisa-Phan closed 2 weeks ago

Lisa-Phan commented 2 weeks ago

Good evening,

I have a few questions about ASH's default convergence choice when running QM/MM. I recall ORCATheory has an option for convergence threshold parameters, but the overall default behavior for QM/MM is that the run stops after 250 iterations?

Additionally, I'm encountering a strange behavior for broken symmetry QM/MM optimization and was wondering if it was due to user error. The system is the same as one mentioned the previous thread. Compared to the singlet or highspin treatment, which converges stably, broken symmetry system seems much less stable. The xyz shows expansion of the QM region, along with energy initially lowering then rising in a parabolic pattern.

Comparing the orca_last.out file for the two jobs, only the highspin run has both mulliken atomic and spin population printed whereas the broken symmetry run only has atomic population. Does the slight reduction in mulliken atomic population suggested incorrect set up?

My apologies for the rough graphs. Broken_sym_QMMM_energy Highspin_QMMM_energy

Attached beneath is the script I'd used for the broken symmetry run, which used a partially optimized .pdb as the starting point. This is because using the original unoptimized structure to start causes things to not converge.

"""
10/4/2024
Restart broken symmetry solution from 
old incorrect solution
"""
from ash import *

###########################
# VARIABLES
###########################

###########################
# A. FILES
###########################

#solvated_pdb generated from old job
solvated_pdb = 'last_snapshot_6YDI.pdb'

#amber protein forcefield xml
# You can find Amber XML files in a directory here:
# YOURCONDADIRECTORY/envs/ASH_ENVIRONMENTNAME/lib/python3.XX/site-packages/openmm/app/data/amber14
amber_xml = 'amber14-all.xml'

#Defining XML for Fe-part only and solvent and Na (had to be done to avoid conflict with Amber solvent/ions)
specialxmlfile = "fepart.xml"

###########################
# B. RESIDUES SPEC
###########################

#QM-region definition
#Fe and H2O
feions = [8092,8093]
waters= [8094,8095,8096,8097,8098,8099]

#Bound residues
GLU99 = [1610,1611,1612,1613,1614,1615]
GLU129 = [2046,2047,2048,2049,2050,2051]
GLU228 = [3557,3558,3559,3560,3561,3562]
GLU194 = [3062, 3063,3064,3065,3066,3067]
HIE132 = [2101,2102,2103,2104,2105,2106,2107,2108,2109,2110,2111]
HIE231 =  [3612,3613,3614,3615,3616,3617,3618,3619,3620,3621,3622]
#H-bonded
GLN125 = [1982,1983,1984,1985,1986,1987,1988,1989]

###########################
# C. ORCA and QM/MM SPEC
###########################
input_line = "! BP86 def2-SVP tightscf"
extrabasisatoms = [8092,8093] #diiron
extrabasis = 'def2-TZVP'

#Charge and mult of QM-region  (my guesses)
#-4 charge from carboxylates, 
#+4 from 2Fe(II)
charge = 0
mult = 1 #full highspin is 11, broken symmetry is 1

###########################
# MAIN
###########################

frag = Fragment(pdbfile = solvated_pdb)

omm = OpenMMTheory(xmlfiles = [amber_xml, specialxmlfile], 
                   pdbfile = solvated_pdb,  
                   autoconstraints=None, 
                   rigidwater=False)

#QM theory. Using xTB as a test, can change to ORCA
qm = ORCATheory(orcadir='/stor/work/YiLu/orca/orca_three_parts',
                 numcores=8, 
                 orcasimpleinput = input_line,
                 brokensym = True,
                 extrabasisatoms = extrabasisatoms,
                 extrabasis = extrabasis,
                 HSmult = 11, 
                 atomstoflip = [8093])

#Combining everything to a big list
qmregion = feions+waters+GLU99+GLU129+GLU228+GLU194+HIE132+HIE231+GLN125

#Defining QM/MM Theory
qm_mm = QMMMTheory(qm_theory=qm, 
                   mm_theory=omm,
                   qmatoms=qmregion,
                   fragment=frag,
                   qm_charge=charge,
                   qm_mult=mult)

#Run MD or Opt
#MolecularDynamics(theory=qm_mm, fragment=frag, simulation_steps=100, temperature=100)
Optimizer(theory=qm_mm,
          fragment=frag,
          ActiveRegion=True,
          actatoms=qmregion)

and this is the script used for the full high-spin treatment

"""
10/4/2024
Run highspin
"""
from ash import *

###########################
# VARIABLES
###########################

###########################
# A. FILES
###########################

#solvated_pdb generated directly from amber .prmtop and .inpcrd
#solvated_pdb = 'last_snapshot_6YDI.pdb'
solvated_pdb = '6YDI_solv_RBmod.pdb'

#amber protein forcefield xml
# You can find Amber XML files in a directory here:
# YOURCONDADIRECTORY/envs/ASH_ENVIRONMENTNAME/lib/python3.XX/site-packages/openmm/app/data/amber14
amber_xml = 'amber14-all.xml'

#Defining XML for Fe-part only and solvent and Na (had to be done to avoid conflict with Amber solvent/ions)
specialxmlfile = "fepart.xml"

###########################
# B. RESIDUES SPEC
###########################

#QM-region definition
#Fe and H2O
feions = [8092,8093]
waters= [8094,8095,8096,8097,8098,8099]

#Bound residues
GLU99 = [1610,1611,1612,1613,1614,1615]
GLU129 = [2046,2047,2048,2049,2050,2051]
GLU228 = [3557,3558,3559,3560,3561,3562]
GLU194 = [3062, 3063,3064,3065,3066,3067]
HIE132 = [2101,2102,2103,2104,2105,2106,2107,2108,2109,2110,2111]
HIE231 =  [3612,3613,3614,3615,3616,3617,3618,3619,3620,3621,3622]
#H-bonded
GLN125 = [1982,1983,1984,1985,1986,1987,1988,1989]

###########################
# C. ORCA and QM/MM SPEC
###########################
input_line = "! BP86 def2-SVP tightscf"
extrabasisatoms = [8092,8093] #diiron
extrabasis = 'def2-TZVP'

#Charge and mult of QM-region  (my guesses)
#-4 charge from carboxylates, 
#+4 from 2Fe(II)
charge = 0
mult = 11 #full highspin is 11, broken symmetry is 1

###########################
# MAIN
###########################

frag = Fragment(pdbfile = solvated_pdb)

omm = OpenMMTheory(xmlfiles = [amber_xml, specialxmlfile], 
                   pdbfile = solvated_pdb,  
                   autoconstraints=None, 
                   rigidwater=False)

#QM theory. Using xTB as a test, can change to ORCA
qm = ORCATheory(orcadir='/stor/work/YiLu/orca/orca_three_parts',
                 numcores=8, 
                 orcasimpleinput = input_line,
                 #brokensym = True,
                 extrabasisatoms = extrabasisatoms,
                 extrabasis = extrabasis)
                 #HSmult = 11, 
                 #atomstoflip = [8093],

#Combining everything to a big list
qmregion = feions+waters+GLU99+GLU129+GLU228+GLU194+HIE132+HIE231+GLN125

#Defining QM/MM Theory
qm_mm = QMMMTheory(qm_theory=qm, 
                   mm_theory=omm,
                   qmatoms=qmregion,
                   fragment=frag,
                   qm_charge=charge,
                   qm_mult=mult)

#Run MD or Opt
#MolecularDynamics(theory=qm_mm, fragment=frag, simulation_steps=100, temperature=100)
Optimizer(theory=qm_mm,
          fragment=frag,
          ActiveRegion=True,
          actatoms=qmregion)

For the broken symmetry run

                    * MULLIKEN POPULATION ANALYSIS *
                    ********************************

-----------------------
MULLIKEN ATOMIC CHARGES
-----------------------
   0 C :    0.029802
   1 H :    0.072457
   2 H :    0.016628
   3 C :    0.070703
   4 O :   -0.272410
   5 O :   -0.323978
   6 C :   -0.035738
   7 H :    0.085916
   8 H :    0.047390
   9 C :   -0.015024
  10 O :   -0.355577
  11 N :   -0.019122
  12 H :    0.096909
  13 H :    0.131034
  14 C :    0.031496
  15 H :    0.104045
  16 H :    0.020871
  17 C :    0.116219
  18 O :   -0.291751
  19 O :   -0.254690
  20 C :    0.099177
  21 H :    0.043941
  22 H :    0.067915
  23 C :   -0.197818
  24 N :   -0.111446
  25 C :    0.095118
  26 H :    0.012032
  27 N :   -0.021575
  28 H :    0.276580
  29 C :   -0.017181
  30 H :    0.049997
  31 C :    0.024873
  32 H :   -0.028635
  33 H :    0.109517
  34 C :    0.097405
  35 O :   -0.269118
  36 O :   -0.341148
  37 C :    0.033663
  38 H :    0.075316
  39 H :    0.096190
  40 C :    0.112173
  41 O :   -0.322070
  42 O :   -0.204271
  43 C :    0.115405
  44 H :    0.076970
  45 H :    0.068442
  46 C :   -0.116808
  47 N :   -0.197920
  48 C :    0.114362
  49 H :    0.064863
  50 N :   -0.022967
  51 H :    0.250512
  52 C :    0.005217
  53 H :   -0.029945
  54 Fe:    0.213821
  55 Fe:    0.181370
  56 O :   -0.150454
  57 H :    0.168326
  58 H :    0.164314
  59 O :   -0.082961
  60 H :    0.221728
  61 H :    0.203023
  62 H :   -0.018124
  63 H :   -0.011590
  64 H :    0.007520
  65 H :   -0.045359
  66 H :   -0.036006
  67 H :   -0.012360
  68 H :   -0.067194
Sum of atomic charges:   -0.0000000

For the all-highspin run

                    * MULLIKEN POPULATION ANALYSIS *
                    ********************************

--------------------------------------------
MULLIKEN ATOMIC CHARGES AND SPIN POPULATIONS
--------------------------------------------
   0 C :   -0.002921    0.004515
   1 H :    0.083397    0.001228
   2 H :    0.046341    0.000804
   3 C :    0.121258    0.006014
   4 O :   -0.301499    0.095056
   5 O :   -0.318297    0.018615
   6 C :   -0.053175    0.000359
   7 H :    0.093270    0.000036
   8 H :    0.052311   -0.000202
   9 C :    0.019843   -0.000069
  10 O :   -0.392233    0.000038
  11 N :   -0.076541    0.000296
  12 H :    0.117767    0.000233
  13 H :    0.147601    0.000990
  14 C :    0.001439    0.008943
  15 H :    0.118617    0.000568
  16 H :    0.042000    0.001099
  17 C :    0.156016    0.013046
  18 O :   -0.338259    0.067627
  19 O :   -0.267306    0.060456
  20 C :    0.063326   -0.000428
  21 H :    0.073686    0.000407
  22 H :    0.061733    0.000197
  23 C :   -0.070720    0.001907
  24 N :   -0.259411    0.067412
  25 C :    0.135342    0.010282
  26 H :    0.062129    0.000298
  27 N :   -0.039482    0.003125
  28 H :    0.261784    0.001008
  29 C :    0.000932    0.010533
  30 H :    0.059849    0.000818
  31 C :   -0.003475    0.003825
  32 H :    0.030529    0.001237
  33 H :    0.106126    0.000179
  34 C :    0.139130    0.008986
  35 O :   -0.285609    0.034398
  36 O :   -0.351982    0.142930
  37 C :    0.020211   -0.061577
  38 H :    0.043254    0.006123
  39 H :    0.080023    0.057366
  40 C :    0.105106    0.773274
  41 O :   -0.449036    0.241565
  42 O :   -0.402549    0.252490
  43 C :    0.087386    0.000492
  44 H :    0.089176    0.000799
  45 H :    0.078118    0.000093
  46 C :   -0.090312    0.005769
  47 N :   -0.243270    0.049546
  48 C :    0.122213    0.019502
  49 H :    0.072664    0.000578
  50 N :   -0.059686    0.006498
  51 H :    0.262322    0.000755
  52 C :    0.003666    0.004248
  53 H :   -0.010917    0.001306
  54 Fe:    0.479361    3.984602
  55 Fe:    0.367358    3.886631
  56 O :   -0.243510    0.069831
  57 H :    0.206230    0.003526
  58 H :    0.184883    0.010147
  59 O :   -0.177833    0.047009
  60 H :    0.200043    0.013178
  61 H :    0.189714    0.033570
  62 H :   -0.006987    0.000060
  63 H :   -0.006682    0.000052
  64 H :    0.013144    0.001316
  65 H :   -0.034859   -0.000083
  66 H :   -0.025706    0.002090
  67 H :   -0.029987    0.022520
  68 H :   -0.057053   -0.000047
Sum of atomic charges         :   -0.0000000
Sum of atomic spin populations:   10.0000000

I'd really appreciate any pointers on how to troubleshoot the run.

Thank you, Lisa Phan

RagnarB83 commented 2 weeks ago

So regarding convergence, the optimizer by default has a max-iterations setting of 250. Can be changed like this: Optimizer(..., maxiter=500). Btw, for QM/MM optimizations of a protein like this, you may need a lot iterations initially because all coordinates are in bad positions coming from either X-ray structure of a classical MD job. This particularly applies when you have a large active region, e.g. 1000 atoms.

Your broken-symmetry job is almost correctly setup, there is just one detail that applies to the case of broken-symmetry singlets. When you tell ASH to request a broken-symmetry job, it creates an ORCA inputfile containing that input for an open-shell broken-symmetry Flipspin procedure and ORCA will then converge to the open-shell singlet, creating a GBW file containing the orbitals for the broken-symmetry job. This happens in Optimization number 1. In the next optimization step, ASH has turned off the broken-symmetry procedure because there is no good reason to go through the Flipspin procedure again (unnecessarily timeconsuming). Instead ASH creates an ORCA inputfile for a singlet job in the next step and tries to read the GBW file from the previous step; but what goes wrong here is that ORCA by default, for singlets (multiplicity 1), tries to converge a restrickted (RKS), closed-shell electronic structure, instead of unrestricted, UKS. This means that while it reads the broken-symmetry orbitals it interprets them as closed-shell and converges to a useless closed-shell solution. For som reason things go wrong from there.

The simple solution to this is to add "UKS" keyword to your input_line, i.e. input_line = "! BP86 def2-SVP tightscf UKS".

I have updated the documentation, warning about this. I will try to add an option to ORCATheory so that this is automatically performed if the user forgets the UKS keyword.

I think this should solve the problem.

Lisa-Phan commented 2 weeks ago

Thank you Ragnar for the detailed explanation!

I believe this is the second time I've made this mistake... My apologies for not checking the set up more closely.