Closed jchodera closed 8 years ago
It finds the small molecule every time now anyway, so this should be straightforward. As for unique residue names, are there restrictions on the characters I can use there? I don't recall if there was a consensus on another thread (maybe on the OpenMM issue tracker?), but absent an existing standard, I thought perhaps we could use the SMILES part of chemical_state_key
. For simulations where both the protein and ligand change, I had envisioned perhaps a chemical_state_key
that combined smiles and sequence with an underscore*.
*I don't think underscore is used by SMILES, which is why I suggested it. However, since the strings should be Unicode, we could always use emoji-delimiters :)
It doesn't need to be unique to each small molecule does it? Just unique within the topology -- i.e. could you just name it 'MOL' every time, so that the proposal engine identifies it?
otherwise +1 emojis
Do we need to allow the molecule residue names to change? I could imagine the simplest thing to do for now is to have unique three-letter residue names for molecules (eg MOL) that stay constant during the simulation. The current chemical identity in SMILES could be stored in the state key or the metadata for current state.
Protein sequences can change, but we use the chain id right now to identify which chain to mutate.
If we wanted to make things simple and consistent, we could simply have each part of the system that ProposalEngine can manipulate be in a separate chain, and exclusively use chainid.
Another possibility would be to subclass Topology to store more data for each molecule, such as allowing us to attach a SMILES string to a molecule, or indexing species that are being modified by perses.
I may have misunderstood then. I already find the molecule and assume (but this is a default parameter) it will be called MOL
. Not sure why I still take the receptor_topology
in that way. Will fix.
There's an issue in #109 that appears to be related:
======================================================================
ERROR: Testing expanded ensemble sampler with alkanes 'explicit'
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/samplers/samplers.py", line 697, in run
self.update()
File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/samplers/samplers.py", line 684, in update
self.update_sampler()
File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/samplers/samplers.py", line 603, in update_sampler
topology_proposal = self.proposal_engine.propose(system, topology)
File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/rjmc/topology_proposal.py", line 794, in propose
smiles_new, _ = self._topology_to_smiles(new_topology)
File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/rjmc/topology_proposal.py", line 835, in _topology_to_smiles
raise ValueError("More than one residue with the same name!")
ValueError: More than one residue with the same name!
I can take a stab at fixing this if there are no objections---should be simple!
Nope, I'll do it.
Thanks!
I think this is resolved.
Currently,
SmallMoleculeSetProposalEngine
takesreceptor_topology
in the constructor.Since the
current_topology
is passed into everypropose()
call, it would give us much more flexibility if the constructor instead took information on how to identify the small molecule (can we use a unique residue name for now?) and each timepropose()
is called, it finds the small molecule in thetopology
once again.https://github.com/choderalab/perses/blob/master/perses/rjmc/topology_proposal.py#L713