Support Protein Mutations

mikemhenry commented 1 year ago

We will want to support mutations of proteins (and other targets like RNA). I am not sure if we want to support this via a mapping method, or have something simpler like a residue ID and then the amino (or base) you want to mutate that ID into.

mikemhenry commented 1 year ago

CC @jchodera @ijpulidos @zhang-ivy

dwhswenson commented 1 year ago

Our current atom mappings are specialized as LigandAtomMapping. How a ProteinAtomMapping or NucleicAcidAtomMapping might be implemented has been intentionally left undetermined as something that we would deal with at a later time.

richardjgowers commented 1 year ago

The current LigandAtomMapping could potentially be used, it would just be a dict of 1,000s of ints, but it would work. I think as @dwhswenson says, there's probably a more efficient/intuitive way of expressing a protein mutation mapping which uses the residue concepts. Similarly our tools for creating these mappings are geared towards small molecules (though @RiesBen has been seeing how these scale to proteins) and you might want tools that are leveraging the residue basis of these (e.g. perses has code that does an MCS but guarantees the amide bond is included)

zhang-ivy commented 1 year ago

In my recent work on protein mutation free energy calcs for protein:protein complexes, I use a simple mapping scheme that maps all atoms in the residue up to and including the beta carbon (but not including the beta hydrogen). I previously played around with using openeye to define the atom mapping, but could not figure out the optimal set of flags to use to define the mapping.

ijpulidos commented 2 months ago

Trying to revive the discussion here. CC @IAlibay

In general it has been seen (by using kartograf) that we could map proteins using LigandAtomMapping which raises the question of what's that makes a LigandAtomMapping different from a ProteinAtomMapping. Now changing this is potentially an API breaking change, depending how we implement things.

One thing is if we want to rely on the residues concept and this hierarchy, I see this as relying on "types" and some kind of "template library", which, in my opinion, has been shown to be a limiting approach (even if computationally efficient, think about non-standard AAs and post traslational modifications or similar, those could be problematic using this approach).

On the other hand, we could just rely on the underlying implementations (rdkit and kartograf in this case), for big systems this could take a significant amount of time but it should be more robust.

EDIT: What is a good target time for generating protein mappings for our needs?

ijpulidos commented 2 months ago

There seems to be room to have a generic (not abstract) AtomMapping class that can be used for general atom-based maps. Then the specific Ligand, Protein, other component mappings could inherit from this, if needed.

IAlibay commented 2 months ago

@ijpulidos it may be that we need to move some of this discussion to https://github.com/OpenFreeEnergy/gufe/issues/342

OpenFreeEnergy / gufe

Support Protein Mutations #212