Amber ---> ffxml conversion script

rafwiewiora commented 8 years ago

Tagging @jchodera @swails @peastman

Here's what I've got so far. Conversion script, testing protein only so far on two systems. Works all good on a selection of leaprc's so far - the uncommented ones in the YAML, output for you to check out in ffxml/.

Working on:

whatever further leaprc's need - ff14ipq is not passing energy test right now, gaff needs a test system etc.
the Source provenance information - needs a tweak in the ParmEd code
nucleic test system

How does it look?

jchodera commented 8 years ago

What does the script need to run and how is it invoked? I can try to add a travis branch to enable testing of this code.

rafwiewiora commented 8 years ago

It needs the files/ folder as committed, AmberTools / Amber - needs the FF files and tleap, $AMBERHOME must be set, tleap available from OS command line (PATH must be set).

Executable by python conversion_script.py.

swails commented 8 years ago

FWIW, I use ambermini with the ParmEd testing suite and go ahead and just set AMBERHOME to the miniconda prefix -- see here

I like this approach because this is how Amber behaves "in the wild" (which means it's more closely testing the scenario where Amber is installed by the user). But ParmEd is designed to work alongside Amber, and it interacts more closely with more components of Amber.

rafwiewiora commented 8 years ago

Provenance layout now updated to its final form, added a Test key to the yaml to indicate which tests should be performed on the particular leaprc conversion.

jchodera commented 8 years ago

Can we have ParmEd check if ambermini is installed and get AMBERHOME from there? There is no reason this wouldn't be sufficient for our purposes.

rafwiewiora commented 8 years ago

Can we have ParmEd check if ambermini is installed and get AMBERHOME from there? There is no reason this wouldn't be sufficient for our purposes.

Sorry, I deleted that comment the moment I wrote it and realized I was wrong! I simply added parmed.amber.AMBERHOME = AMBERHOME and we're all good.

rafwiewiora commented 8 years ago

Ok, we have most forcefields converted at this point.

Note that I needed to add a try, except AssertionError handling to allow the impropers tolerance to go up to 2e-2, this was needed for ff03ua to pass.

Working on nucleic acid tests now.

jchodera commented 8 years ago

Can we capture the errors from the testing into an output file? This could be very useful in documenting the validation procedure.

rafwiewiora commented 8 years ago

Definitely! I'm going to add a log file functionality.

jchodera commented 8 years ago

Something computer-readable might also be good. We could make a table of the validation.

rafwiewiora commented 8 years ago

I like that. Maybe write out to a YAML then?

jchodera commented 8 years ago

Whatever is convenient. CSV, YAML, XML, pickle...

rafwiewiora commented 8 years ago

Alright!

rafwiewiora commented 8 years ago

Quick question: what is more preferable for energy validation:

store PDB's without hydrogens and rely on tleap adding them. This is ok no matter where the H's get placed, because topology and positions for the ffxml energy calculation are pulled from the prmtop input (i.e. PDB only used by tleap)
store PDBs with all hydrogens

For now I have gone with the second option, because it seems more reproducible. But an advantage the first option has is not having to worry about getting the hydrogens right for all FFs - the only case this has been important for is the united-atom forcefield though. With option one I can use the same PDB for explicit atom and united-atom FFs, with option two I need separate PDBs for united-atom.

What do you think?

swails commented 8 years ago

You're interested in UA FFs from Amber? Nobody has worked on those in >a decade...

I'd personally go with hydrogens and then strip them out using ParmEd for UA FFs where they're not needed.

rafwiewiora commented 8 years ago

You're interested in UA FFs from Amber? Nobody has worked on those in >a decade...

Honestly it takes me less time to add a few lines to the script and convert everything, then worry about what's being used and what is not at this stage. You guys can decide further on which FFs to PR into OpenMM, but I want capability to convert everything. (within reason, I'm not touching those GLYCAMs).

I'd personally go with hydrogens and then strip them out using ParmEd for UA FFs where they're not needed.

Thanks!

swails commented 8 years ago

Fair enough :smiley:

rafwiewiora commented 8 years ago

Another quick question! I'm using 4RZN for DNA validation, cleaned it up with PDBFixer. Works fine except for older FFs using all_nucleic94.lib, which has atom naming slightly different to the fresher nucleic lib's / what comes with the downloaded PDB / what PDBFixer & OpenMM output.

Is there a tool out there to do the new names ---> old names conversion?

swails commented 8 years ago

Is there a tool out there to do the new names ---> old names conversion?

Not that I know of... But if you download 4RZN from the PDB, it has PDB 3 naming (which will work with nucleic10.lib, but not all_nucleic94.lib).

You could probably generate a mapping yourself just by comparing those two lib files, actually...

rafwiewiora commented 8 years ago

So I just reversed the old --> new mapping from leaprc.ff14SB to new --> old and added it:

addPdbResMap {
{ 0 "DG" "DG5"  } { 1 "DG" "DG3"  }
{ 0 "DA" "DA5"  } { 1 "DA" "DA3"  }
{ 0 "DC" "DC5"  } { 1 "DC" "DC3"  }
{ 0 "DT" "DT5"  } { 1 "DT" "DT3"  }
}
addPdbAtomMap {
{ "H1'" "H1*" }
{ "H2'" "H2'1" }
{ "H2''" "H2'2" }
{ "H3'" "H3*" }
{ "H4'" "H4*" }
{ "H5'" "H5'1" }
{ "H5''" "H5'2" }
{ "HO2'" "HO'2" }
{ "HO5'" "H5T"  }
{ "HO3'" "H3T" }
{ "OP1" "O1P" }
{ "OP2" "O2P" }
}

to the LeAP script before the source call, so this can get overwritten by the map calls in the new leaprcs. Works well!

rafwiewiora commented 8 years ago

Interestingly some of these leaprc's add the terminal e.g 0 G --> DG5, 1 G --> DG3 mappings with the assumption that any unspecified nucleotide is DNA, but they forget about 0 DG --> DG5, 1 DG --> DG3 etc.

rafwiewiora commented 8 years ago

DNA and RNA testing added, improper testing for those turned off for now, pending https://github.com/choderalab/openmm/issues/9

rafwiewiora commented 8 years ago

Closing to open a fresh one.

choderalab / openmm

Amber ---> ffxml conversion script #2