Closed jchodera closed 8 years ago
Just bumping this thread.
In talking with @bas-rustenburg, we thought it might be useful to have separate command-line modes, something like
yank setup --receptor receptorspec --ligand ligandspec --forcefield amber99sbildn --ffgen gaff
- set up using internal parameterization stuff (with help from OpenEye tools, OpenMM, pdbfixer, and gaff2xml)yank ambersetup --prmtop prmtopfile --inpcrd inpcrdfile
- set up from LEAP filesThe current scheme would effectively become yank ambersetup
, while the new scheme (yank setup
) would try to parameterize whatever you threw at it (PDB, RCSB ID, mol2, SDF, IUPAC, SMILES, etc.).
Some further refinements to this idea:
To set up from AMBER LEaP files:
yank import amber --ligand_prmtop ligand.prmtop --receptor_prmtop receptor.prmtop --complex_prmtop complex.prmtop --complex_crd complex.crd [--receptor_crd receptor.crd --ligand_crd ligand.crd]
yank import filetype
forms could be added later to allow import from CHARMM, gromacs, etc. as OpenMM adds support for them.To initialize YANK simulations using the OpenMM app
and gaff2xml
to build things, I think we want syntax like these use cases:
# Set up protein (PDB) and ligands (mol2) in implicit solvent
yank setup --receptor receptor.pdb --receptor_forcefield ffxml:ff99sb --ligand ligands.mol2 --ligand_forcefield gaff2xml:gaff:bcc --implicit --destdir complexes/
There's actually a lot of data we might want to cram in there: which forcefield parameterization scheme to use (ffxml
vs gaff2xml
), which forcefield or parameterization scheme we want to use (ff99sb.xml
, gaff.dat
, bcc
charges). I do wonder if we really need some sort of dict-like way to specify parameters, like a JSON or XML format for setting things up.
# Set up host (mol2) and guests (mol2) in explicit solvent with 10 A buffer region
yank setup --receptor host.mol2 --receptor_forcefield gaff2xml:gaff:bcc --ligand guests.mol2 --ligand_forcefield gaff2xml:gaff:bcc --explicit "10*angstrom" --destdir host-guest/
# Set up receptor (PDB) with some ChemDraw sketches
yank setup --receptor pdbid:3QCY --receptor_forcefield ffxml:ff99sb --ligand ideas.cdx --ligand_forcefield gaff2xml:gaff:bcc --implicit --destdir ouathek-ideas/
Just some thoughts. I think this needs further refinement.
Maybe we should start collecting use cases on a wiki page?
We can discuss here, right? It's probably good to keep pinging people periodically otherwise people won't see the discussion.
Sure, it's good to discuss here, but we can also compile (cut and paste) into the wiki once we have an idea of what real use cases are like.
yes
I think we should be cautious about requiring too many command line flags that all must be specified at once Too many and it will be very annoying to users, especially if a typo is made.
If we are wanting to make a large number of acceptable inputs, we should probably come up with a single input file that can either be written on its own (like an XML file), or at least have yank setup
write to a common file so it can be copied, edited, loaded as needed. This way a user could run commands either all at once or in fragments to make it easier to maintain.
For instance, if one wants to set up the PDB and mol2 with several ligands, they would run
# Pull in receptor information
yank setup --receptor receptor.pdb --receptor_forcefield ffxml:ff99sb
# Pull in ligand information
yank setup --ligand ligands.mol2 --ligand_forcefield gaff2xml:gaff:bcc
# Set other flags
yank setup --implicit --destdir complexes/
and all of this would all write to a single, portable XML file. They could also run all of this in one line. Then if they want to run the same simulation but change the ligand forcefield, they would just rerun:
yank setup --ligand_forcefield ffxml:ff99sb
targeting the new forcefield file and changing the entry in the XML file.
I think this would make it easier for users to create and change simulations, and then it would also give a common file which could be passed to others wanting to repeat or slightly tweak a yank simulation. One drawback is yank run
would need to validate that the XML was complete and/or fill in with defaults for missing keys.
I also like the idea of using a single editable input file, with command-line flags for certain common options. I suggest command-line flags would take priority if the same fields are specified in the input file (and a note could be printed by the code to indicate this behavior to the user).
The other advantage here is that the input file can be referenced by the user (or another user) at a later point.
I find YAML is quite a nice format for user-editable files: http://www.yaml.org/start.html I definitely prefer it to editing XML, and the syntax is pretty intuitive.
Another option is to use a Python file (e.g. "yank_project_config.py") which can then be imported directly by Yank as a module.
On Thu, Jun 12, 2014 at 12:57 PM, Levi Naden notifications@github.com wrote:
I think we should be cautious about requiring too many command line flags that all must be specified at once Too many and it will be very annoying to users, especially if a typo is made.
If we are wanting to make a large number of acceptable inputs, we should probably come up with a single input file that can either be written on its own (like an XML file), or at least have yank setup write to a common file so it can be copied, edited, loaded as needed. This way a user could run commands either all at once or in fragments to make it easier to maintain.
For instance, if one wants to set up the PDB and mol2 with several ligands, they would run
Pull in receptor information
yank setup --receptor receptor.pdb --receptor_forcefield ffxml:ff99sb
Pull in ligand information
yank setup --ligand ligands.mol2 --ligand_forcefield gaff2xml:gaff:bcc
Set other flags
yank setup --implicit --destdir complexes/
and all of this would all write to a single, portable XML file. They could also run all of this in one line. Then if they want to run the same simulation but change the ligand forcefield, they would just rerun:
yank setup --ligand_forcefield ffxml:ff99sb
targeting the new forcefield file and changing the entry in the XML file.
I think this would make it easier for users to create and change simulations, and then it would also give a common file which could be passed to others wanting to repeat or slightly tweak a yank simulation. One drawback is yank run would need to validate that the XML was complete and/or fill in with defaults for missing keys.
— Reply to this email directly or view it on GitHub https://github.com/choderalab/yank/issues/42#issuecomment-45918547.
I agree with Danny
I like the idea of a file too, with command line overrides. If we used JSON (or YAML) and only permitted a few arguments to be overwritten (eg number of iterations) then the driver would not be too complex.
On the other hand, we could switch gears and just focus on the Python interfaces right now and have each setup script for a particular application actually be a Python script. That would be harder for novice users, but would give us maximum flexibility for what we need to do now without being locked into the effort of worrying about file and command line parsing...
On the other hand, we could switch gears and just focus on the Python interfaces right now and have each setup script for a particular application actually be a Python script.
I strongly agree with this idea--i.e. design the "objects" first and let the command line follow.
+1
On Thu, Jun 12, 2014 at 2:02 PM, kyleabeauchamp notifications@github.com wrote:
On the other hand, we could switch gears and just focus on the Python interfaces right now and have each setup script for a particular application actually be a Python script.
I strongly agree with this idea--i.e. design the "objects" first and let the command line follow.
— Reply to this email directly or view it on GitHub https://github.com/choderalab/yank/issues/42#issuecomment-45926497.
OK, let's switch gears and sketch out the interface to tackle the kinds of use cases we want to deal with.
Some example use cases (an expanded version of the above):
inpcrd
files for complex, ligand, receptor along with (possibly multi-model) PDB file or inpcrd
file for complexffxml
file, multiple choices for ligand to be parameterized with gaff2xml
, potentially including generating/expanding conformers or a molecular topology
Much of the file processing could be driver-specific. We essentially need to focus on exactly what kinds of data will go into the YANK-provided classes.
All of this is implemented.
We have to make some decisions about how we tell YANK what we want it to do.
To be specific, we need to tell it:
We have a few options for how to specify this:
Yank
module. All parameters are coded in Python.yank setup
to set up a calculationyank run
to run/resume a calculationyank info
to get some quick info on progressyank analyze
to analyze a calculationThoughts?