Open mikemhenry opened 1 year ago
CC @jchodera @ijpulidos @zhang-ivy
Our current atom mappings are specialized as LigandAtomMapping
. How a ProteinAtomMapping
or NucleicAcidAtomMapping
might be implemented has been intentionally left undetermined as something that we would deal with at a later time.
The current LigandAtomMapping
could potentially be used, it would just be a dict of 1,000s of ints, but it would work. I think as @dwhswenson says, there's probably a more efficient/intuitive way of expressing a protein mutation mapping which uses the residue concepts. Similarly our tools for creating these mappings are geared towards small molecules (though @RiesBen has been seeing how these scale to proteins) and you might want tools that are leveraging the residue basis of these (e.g. perses has code that does an MCS but guarantees the amide bond is included)
In my recent work on protein mutation free energy calcs for protein:protein complexes, I use a simple mapping scheme that maps all atoms in the residue up to and including the beta carbon (but not including the beta hydrogen). I previously played around with using openeye to define the atom mapping, but could not figure out the optimal set of flags to use to define the mapping.
Trying to revive the discussion here. CC @IAlibay
In general it has been seen (by using kartograf) that we could map proteins using LigandAtomMapping
which raises the question of what's that makes a LigandAtomMapping
different from a ProteinAtomMapping
. Now changing this is potentially an API breaking change, depending how we implement things.
One thing is if we want to rely on the residues concept and this hierarchy, I see this as relying on "types" and some kind of "template library", which, in my opinion, has been shown to be a limiting approach (even if computationally efficient, think about non-standard AAs and post traslational modifications or similar, those could be problematic using this approach).
On the other hand, we could just rely on the underlying implementations (rdkit and kartograf in this case), for big systems this could take a significant amount of time but it should be more robust.
EDIT: What is a good target time for generating protein mappings for our needs?
There seems to be room to have a generic (not abstract) AtomMapping
class that can be used for general atom-based maps. Then the specific Ligand, Protein, other component mappings could inherit from this, if needed.
@ijpulidos it may be that we need to move some of this discussion to https://github.com/OpenFreeEnergy/gufe/issues/342
We will want to support mutations of proteins (and other targets like RNA). I am not sure if we want to support this via a mapping method, or have something simpler like a residue ID and then the amino (or base) you want to mutate that ID into.