choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
181 stars 51 forks source link

SmallMoleculeSetProposalEngine complains about "More than one residue with the same name" #110

Closed jchodera closed 8 years ago

jchodera commented 8 years ago

Currently, SmallMoleculeSetProposalEngine takes receptor_topology in the constructor.

Since the current_topology is passed into every propose() call, it would give us much more flexibility if the constructor instead took information on how to identify the small molecule (can we use a unique residue name for now?) and each time propose() is called, it finds the small molecule in the topology once again.

https://github.com/choderalab/perses/blob/master/perses/rjmc/topology_proposal.py#L713

pgrinaway commented 8 years ago

It finds the small molecule every time now anyway, so this should be straightforward. As for unique residue names, are there restrictions on the characters I can use there? I don't recall if there was a consensus on another thread (maybe on the OpenMM issue tracker?), but absent an existing standard, I thought perhaps we could use the SMILES part of chemical_state_key. For simulations where both the protein and ligand change, I had envisioned perhaps a chemical_state_key that combined smiles and sequence with an underscore*.

*I don't think underscore is used by SMILES, which is why I suggested it. However, since the strings should be Unicode, we could always use emoji-delimiters :)

juliebehr commented 8 years ago

It doesn't need to be unique to each small molecule does it? Just unique within the topology -- i.e. could you just name it 'MOL' every time, so that the proposal engine identifies it?

otherwise +1 emojis

jchodera commented 8 years ago

Do we need to allow the molecule residue names to change? I could imagine the simplest thing to do for now is to have unique three-letter residue names for molecules (eg MOL) that stay constant during the simulation. The current chemical identity in SMILES could be stored in the state key or the metadata for current state.

Protein sequences can change, but we use the chain id right now to identify which chain to mutate.

If we wanted to make things simple and consistent, we could simply have each part of the system that ProposalEngine can manipulate be in a separate chain, and exclusively use chainid.

jchodera commented 8 years ago

Another possibility would be to subclass Topology to store more data for each molecule, such as allowing us to attach a SMILES string to a molecule, or indexing species that are being modified by perses.

pgrinaway commented 8 years ago

I may have misunderstood then. I already find the molecule and assume (but this is a default parameter) it will be called MOL. Not sure why I still take the receptor_topology in that way. Will fix.

jchodera commented 8 years ago

There's an issue in #109 that appears to be related:

======================================================================
ERROR: Testing expanded ensemble sampler with alkanes 'explicit'
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/choderaj/miniconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/samplers/samplers.py", line 697, in run
    self.update()
  File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/samplers/samplers.py", line 684, in update
    self.update_sampler()
  File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/samplers/samplers.py", line 603, in update_sampler
    topology_proposal = self.proposal_engine.propose(system, topology)
  File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/rjmc/topology_proposal.py", line 794, in propose
    smiles_new, _ = self._topology_to_smiles(new_topology)
  File "/Users/choderaj/github/choderalab/perses/perses.choderalab/perses/rjmc/topology_proposal.py", line 835, in _topology_to_smiles
    raise ValueError("More than one residue with the same name!")
ValueError: More than one residue with the same name!
jchodera commented 8 years ago

I can take a stab at fixing this if there are no objections---should be simple!

pgrinaway commented 8 years ago

Nope, I'll do it.

jchodera commented 8 years ago

Thanks!

pgrinaway commented 8 years ago

I think this is resolved.