ayushkarnawat / profit

Exploring evolutionary protein fitness landscapes
MIT License
1 stars 0 forks source link

Support for mutating arbitrary sequences #7

Open ayushkarnawat opened 4 years ago

ayushkarnawat commented 4 years ago

The PDBMutator currently only modifies amino acid residues at the specified positions for a given Protein Data Bank (PDB) id. Rather, we should have the ability to mutate any passed in sequence or structure. First, we have to check if the value passed into the first parameter of PDBMutator().modify_residues() is either (a) valid PDB id, (b) PDB file, (c) an arbitrary structure, or (d) an arbitrary sequence.

Depending on which type is passed in, the mutator will check if it can mutate to the format the user specified. The table details which mutations are compatible with each input type.

Input type Can mutate to
PDB ID Primary, Tertiary
PDB/SDF file Primary, Tertiary
Structure Primary, Tertiary
Sequence Primary

Note that, if only a sequence is passed in, the mutator can only modify it to the primary format type. This is because, with only the sequence of residue names, we lose 3D information about the protein. As such, no tertiary amino acid mutations can be made.

ayushkarnawat commented 4 years ago

Due to the variability in input types, we might have to either (1) migrate from using PyMOL's mutagenesis wizard to perhaps a more general purpose mutator (see in-house built mutator #4), or (2) convert each input into a PDB-file like object and read/mutate it via PyMOL's wizard.

The latter is easier to implement, as it means using a heavily tested system's mutagenesis (albeit with less control over how mutations are performed). In particular, we will have no choice over how the bond length, angles, and dihedral torsion angles are chosen for the rotamers for the mutated residues.