RagnarB83 / ash

ASH is a Python-based computational chemistry and QM/MM environment, primarily for molecular calculations in the gas phase, explicit solution, crystal or protein environment.
GNU General Public License v2.0
58 stars 13 forks source link

Bug & issue report #431

Open Zuttergutao opened 1 week ago

Zuttergutao commented 1 week ago

Hi RagnarB83,

I’ve been working on learning ASH recently, and I’ve encountered several issues while going through the ASH tutorial. I’d like to report them here:

Problems:

  1. The solvate_small_molecule function doesn’t seem to work. I’m getting a TypeError: solvate_small_molecule received an unexpected keyword argument “nonbonded_pars.”
  2. While running a QM/MM geometry optimization for a protein’s QM region (around 100 atoms), the process struggles to meet the convergence criteria and tends to get stuck in local minima. This happens even when I set the number of optimization steps to 500. The converge results were 1e-2/1e-2 (rms/max) for the gradient and 1e-3/1e-3 (rms/max) for the displace.
Zuttergutao commented 1 week ago
  1. When I run the Surface_scan job using the QMMMTheory method, I encounter an AttributeError: 'QMMMTheory' object has no attribute 'Filename'.
RagnarB83 commented 1 week ago

Hi, thanks for reporting. I appreciate someone going through the tutorials and reporting back, this the only way of making sure they remain up-to-date.

  1. Solvate_small_molecule function: Unfortunately, the behaviour of this function changed, and the Metadynamics tutorial had not been properly updated. Previously, the function both creates the small-molecule FF and solvated, now it only solvates and requires the FF to have been created by another function. This is now documented on the page, let me know if it's clear.

  2. So, it is often the case for newly set-up protein systems that the first geometry optimization is quite tough to converge since all the atom positions are quite bad (coming either from an X-ray structure or from an MM simulation), so sometimes a few hundred optimization cycles are required. One can speed things up for this first optimization by choosing a cheap QM-level of theory and then switch to the more accurate QM-theory later. Later optimizations will be easier to converge. This typically applies when you have a large active region (not QM-region) of e.g. a 1000 atoms. It's also possible that to try another internal-coordinate system, the default is HDLC. If you send me the outputfile I can also check if something abnormal is going on.

  3. Probably a bug, let me take a look.

RagnarB83 commented 1 week ago
  1. I fixed the bug that gave an Attribute error for scans for QMMMTheory objects.

The fix is available on the new branch. Will be on the main branch in a week or so after a merge.

Zuttergutao commented 1 week ago

Hi, thanks for reporting. I appreciate someone going through the tutorials and reporting back, this the only way of making sure they remain up-to-date.

  1. Solvate_small_molecule function: Unfortunately, the behaviour of this function changed, and the Metadynamics tutorial had not been properly updated. Previously, the function both creates the small-molecule FF and solvated, now it only solvates and requires the FF to have been created by another function. This is now documented on the page, let me know if it's clear.
  2. So, it is often the case for newly set-up protein systems that the first geometry optimization is quite tough to converge since all the atom positions are quite bad (coming either from an X-ray structure or from an MM simulation), so sometimes a few hundred optimization cycles are required. One can speed things up for this first optimization by choosing a cheap QM-level of theory and then switch to the more accurate QM-theory later. Later optimizations will be easier to converge. This typically applies when you have a large active region (not QM-region) of e.g. a 1000 atoms. It's also possible that to try another internal-coordinate system, the default is HDLC. If you send me the outputfile I can also check if something abnormal is going on.
  3. Probably a bug, let me take a look.

Thanks for replying! However, i still encounter some problems.

  1. Regarding problem 1.

    • The option 1 did not work.the error is `you must have at least OpenMM 6.3 Installed". But I have installed OpenMM 8.1.2.
    • The option 2 works. But the mtd script gave the valueError ValueError: Found multiple NonbondedForce tags with different 1-4 scales Herei is the generated forcefield files. I think the problem is the difference of coulomb14scale between the LIG.xml you provided (0.83333) and I generated (1.0) ff.zip
  2. Regarding Problem 2. The following is the file I used. I am a beginner in QM/MM and hope you can give me some advice and guidance. When I set the convergence limit to superloose, it converged quickly. input.zip

  3. Regarding Problem 3. It works!

Zuttergutao commented 1 week ago
  1. I fixed the bug that gave an Attribute error for scans for QMMMTheory objects.

The fix is available on the new branch. Will be on the main branch in a week or so after a merge.

Interestingly, it was able to do the calculations. But there were some problems with the output (I was using QM/MM two CV scans).

  File "/home/casea/CASEADATA/Research Project/Pathway/BmFaldDH/Amber/ASH/PESScan/orca_pes.py", line 60, in <module>
    results = calc_surface(fragment=frag, theory=qmmmobject, scantype='UnRelaxed', resultfile='surface_results.txt', runmode='serial', 
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/site-packages/ash/modules/module_surface.py", line 507, in calc_surface
    result.write_to_disk(filename="ASH_surface.result")
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/site-packages/ash/modules/module_results.py", line 122, in write_to_disk
    f.write(json.dumps(newdict, allow_nan=True))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
TypeError: keys must be str, int, float, bool or None, not tuple  File "/home/casea/CASEADATA/Research Project/Pathway/BmFaldDH/Amber/ASH/PESScan/orca_pes.py", line 60, in <module>
    results = calc_surface(fragment=frag, theory=qmmmobject, scantype='UnRelaxed', resultfile='surface_results.txt', runmode='serial', 
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/site-packages/ash/modules/module_surface.py", line 507, in calc_surface
    result.write_to_disk(filename="ASH_surface.result")
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/site-packages/ash/modules/module_results.py", line 122, in write_to_disk
    f.write(json.dumps(newdict, allow_nan=True))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
TypeError: keys must be str, int, float, bool or None, not tuple

In the meantime, is it possible to write a function so that each cv energy calculated can be output when it is updated, rather than outputting all together at the end?

RagnarB83 commented 1 week ago
  • The option 1 did not work.the error is `you must have at least OpenMM 6.3 Installed". But I have installed OpenMM 8.1.2.

OK interesting, so this is not an ASH error message but a flawed error message coming from the parmed library dependency. Do you mind posting the full error message (if something more) and the version of the parmed library that was installed (and whether you installed it using pip or conda/mamba).

  • The option 2 works. But the mtd script gave the valueError ValueError: Found multiple NonbondedForce tags with different 1-4 scales Here is the generated forcefield files. I think the problem is the difference of coulomb14scale between the LIG.xml you provided (0.83333) and I generated (1.0) ff.zip

So this is a bit of an annoying thing that happens due to the difference between CHARMM-style and Amber-style forcefields and the way OpenMM handles this, and then ASH-generated XML-files vs. pairing them with a solvent forcefield like TIP3P that can also exist in CHARMM and Amber forms. This is mentioned in the ASH documentation here: https://ash.readthedocs.io/en/latest/Explicit-solvation.html#issues

I will have to update the tutorials once I'm done travelling to make sure this problem is avoided. Pairing the solute FF XML-file with e.g. the Amber-style TIP3P forcefield: "amber14/tip3p.xml" should work to avoid this problem right now.

  1. Regarding Problem 2. The following is the file I used. I am a beginner in QM/MM and hope you can give me some advice and guidance. When I set the convergence limit to superloose, it converged quickly. input.zip

OK, so first a couple of things. Visualizing your system coordinates I see that the system was created in Amber and using a hexagonal unit cell system. Also the protein sticks out of the water box when the system is interpreted non-periodically. For QM/MM in ASH, the QM-part will always have to be done non-periodically so it is going to be necessary to first wrap the coordinates so that the protein is in the center of the box in order to do a sensible calculation. It is usually also better to have simpler cubic boxes for this purpose. If this is not an option

I would not at all use the superloose convergence setting, this is a dummy-setting just for basic testing.

For the very first QM/MM geometry optimization of a new system with a sizeable active region you should be prepared that many geometry optimization steps will initially be required. This is because your active region contains quite a lot of atoms (466 in your case), meaning 1398 degrees of freedom (quite a bit) and all of the atoms have large forces acting on them in the very beginning (since the system is an X-ray structure or from an MM simulation), this is simply a tough minimization problem that requires a lot of iterations. So it may actually be perfectly normal for the geometry optimization not too converge using the default max-iterations (or even 500). Without seeing the outputfile I can't really tell whether things were progressing well or not. Note that even though it sometimes seems that the system is never getting close to convergence (i.e. the gradient keep oscillating) but the energy is still getting lower and lower, then this still indicates that the optimization is progressing and will probably converge. Only if the energy is completely stuck (i.e. not changing or zig-zagging), would I say that something is really wrong. I would also visualize the geometry optimization trajectory to make sure things look like they are behaving, i.e. atoms are moving in a sensible way.

So for many new protein setups you may have to set maxiter to a really large number (1000 or more) or restart the optimization a couple of times (use the last set of coordinates in each restart). Because this can actually be quite expensive to do at the QM/MM level, it is good to make the QM-part really cheap in the beginning while you are in this phase. For example: use a small-basis set (double-zeta), use a non-hybrid (e.g. BP86 instead of B3LYP), use a very small QM-region (just the bare minimum). Another option is to switch to a semi-empirical level of theory during this phase, and here your best bet is probably to use the xtb-program with either the GFN1-xTB or GFN2-XTB method since you have Zn. Once you have been able to complete this initial optimization of the new system, things will be much easier. You can switch to your desired theory-level again and perhaps a larger QM-region and you are likely to reconverge in less than 100 iterations. When you change the geometry of the QM-region, e.g. exploring another isomer or conformer, you are also likely to converge in less than 100 iterations. It is just in this first stage where things will really take a long time, or if you increase the active region.

RagnarB83 commented 1 week ago
  1. I fixed the bug that gave an Attribute error for scans for QMMMTheory objects.

The fix is available on the new branch. Will be on the main branch in a week or so after a merge.

Interestingly, it was able to do the calculations. But there were some problems with the output (I was using QM/MM two CV scans).

 File "/home/casea/CASEADATA/Research Project/Pathway/BmFaldDH/Amber/ASH/PESScan/orca_pes.py", line 60, in <module>
   results = calc_surface(fragment=frag, theory=qmmmobject, scantype='UnRelaxed', resultfile='surface_results.txt', runmode='serial', 
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/site-packages/ash/modules/module_surface.py", line 507, in calc_surface
   result.write_to_disk(filename="ASH_surface.result")
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/site-packages/ash/modules/module_results.py", line 122, in write_to_disk
   f.write(json.dumps(newdict, allow_nan=True))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/__init__.py", line 231, in dumps
   return _default_encoder.encode(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/encoder.py", line 200, in encode
   chunks = self.iterencode(o, _one_shot=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/encoder.py", line 258, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
TypeError: keys must be str, int, float, bool or None, not tuple  File "/home/casea/CASEADATA/Research Project/Pathway/BmFaldDH/Amber/ASH/PESScan/orca_pes.py", line 60, in <module>
   results = calc_surface(fragment=frag, theory=qmmmobject, scantype='UnRelaxed', resultfile='surface_results.txt', runmode='serial', 
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/site-packages/ash/modules/module_surface.py", line 507, in calc_surface
   result.write_to_disk(filename="ASH_surface.result")
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/site-packages/ash/modules/module_results.py", line 122, in write_to_disk
   f.write(json.dumps(newdict, allow_nan=True))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/__init__.py", line 231, in dumps
   return _default_encoder.encode(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/encoder.py", line 200, in encode
   chunks = self.iterencode(o, _one_shot=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/casea/mambaforge/envs/ASH/lib/python3.11/json/encoder.py", line 258, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
TypeError: keys must be str, int, float, bool or None, not tuple

In the meantime, is it possible to write a function so that each cv energy calculated can be output when it is updated, rather than outputting all together at the end?

This might actually be a bug, if you send me the inputscript and coordinate-files I will take a look at this and try to reproduce.

RagnarB83 commented 6 days ago

FYI: the tutorials have been updated so that the

ValueError: Found multiple NonbondedForce tags

problem is avoided.