Use Open Force Field parameters for ligands #1211

Open daveminh opened 4 years ago

daveminh commented 4 years ago

There is no clear way to use Open Force Field parameters for ligands, as the YAML input takes in mol2 and leaprc.gaff. Maybe if the ligands could be set up from prmtop and inpcrd files then it would be possible. They would just need to be exported from the Open Force Field toolkit.

daveminh commented 4 years ago

Looking at ( it looks like if I have the correct names for the prmtop and inpcrd files then YANK will bypass its own setup.

andrrizzi commented 4 years ago

That's correct. You can bypass the automatic setup pipeline and provide your own system files that you can prepare independently with the openforcefield toolkit. See here for the syntax:

jchodera commented 4 years ago

It should be relatively straightforward to add a simple additional step to the pipeline to re-assign parameters to the whole system using the openmmforcefields SystemGenerator. I'll make a plan to add this.

daveminh commented 4 years ago

Even if the new force fields are not much more accurate than GAFF, they have more chemical space coverage. I was trying to set up some calculations of several known inhibitors of 3cl-protease from COVID-19 to demonstrate for my class. I couldn't set up most of them with antechamber. However, I was able to build them with the open force field toolkit and add solvent with the modeller module in OpenMM.. Then I ran them with OpenMM for 5 ns. Except for one of five systems, both the ligand in solvent and in the complex run and seem to equilibrate.

However I'm having trouble running them in YANK, as it says "Potential energy is NaN after 0 attempts of integration with move LangevinSplittingDynamicsMove Attempting a restart...". This happens whether I minimize or not. It happened before I tried to equilibrate.

This is my attempt at the YAML script that directly loads AMBER inpcrd and prmtop files. Is there something wrong with it?

daveminh commented 4 years ago

I ran it again with the same problem but let the error logs complete.

jchodera commented 4 years ago

@daveminh : Can you post a tarball of some input files you would like to use through the normal YANK pipeline? If you configure them with GAFF now, I can quickly add a pipeline stage that reassigns parameters with the openforcefield force field.

To work, the systems would need to successfully parameterize with GAFF first. We'd simply add a second step that replaces the parameters with openff + OpenMM force fields after that.

daveminh commented 4 years ago

Some of the systems failed with GAFF. But I can try to get one of them working...

jchodera commented 4 years ago

Hm. Can you give some examples that fail? Maybe we can come up with a pipeline that works around this.

jchodera commented 4 years ago

Here's Figure 3 from the Jin et al. paper: image I don't think openff can cover Ebselen (which has a selenium), but I think we can cover the rest.

I'll look into what's going on with the files you've already created!

daveminh commented 4 years ago

Yeah, I’m working on the failures. I was trying to make a pipeline around it by building the inpcrd and prmtop files directly.

jchodera commented 4 years ago

The log link you posted doesn't seem to resolve.

jchodera commented 4 years ago

The YAML file looks good to me. Trying to run locally.

daveminh commented 4 years ago

Just tried it and it works for me. Try again?

jchodera commented 4 years ago

Do you have the prmtop files in one of your branches? I checked out master, but only see .chk and .inpcrd files in complexes/1-equilibrate.

In general, the checkpoint files should only be used for temporary checkpointing since they are not platform-portable. state.xml files (especially gzipped ones) will be portable, or you can use other formats that have box vectors too.

daveminh commented 4 years ago

The prmtop are in ../0-build/

daveminh commented 4 years ago

The paper also identifies cinanserin as a weaker binder. That’s helpful for testing dynamic range since the dataset would have three orders of magnitude.

daveminh commented 4 years ago

GAFF setup seems to work for ZINC000001714738, ZINC000002015152, and ZINC000003951740. Antechamber complains about carbon valences in Tideglusib (ZINC000013985228) and Carmofur (ZINC000001542916).

daveminh commented 4 years ago

This is a YAML that doesn't work with antechamber.

jchodera commented 4 years ago

I seem to be able to get MPro_ZINC000001714738.yaml to run. Is this one of the examples that NaNs?

Are you using OpenMM 7.4.1? Can you paste your conda list environment?

Finally, I was trying to access your experiments.log, but that file doesn't seem to exist.

daveminh commented 4 years ago

Okay I've organized the files a bit more clearly. I am trying two ways to set up the systems.

1) The first is with the Open Force Field Toolkit. This was successful in setting up the system (5/5) and running OpenMM for 5 ns (4/5). However this one is giving NaN errors in YANK for ZINC000001714738, ZINC000002015152, and ZINC000013985228. However, ZINC000003951740 seems to run fine.

2) The second is with GAFF. This is what you suggested could be a way to start setup in new force fields. It was successful in setting up (2/5) of systems. For ZINC000013985228 and ZINC000001542916, antechamber complains. One system is still being set up. The other two systems seem to run fine.

