Closed alekhyaa2 closed 9 months ago
Discussed this and found the issue is an incorrect dictionary w.r.t how we currently load globals.
@alekhyaa2 instead we want to pull from the CIF file the PTM mapping of the modified 3-letter code to the amino acid. For example, that detail for 5x94 can be seen here.
AdjacenyFiles.makePTM_dict can now generate a dict with all the pTMs present across PDB structures in the reference. Currently, we will use this as input to the contcatmapClass. Also, identified that there are residues with zero occupancy that currently report as new amino acids encountered while running contactmap. The cif files do not have a one letter code for such residues. Although, the PDB structures use a different letter to represent such residues.
Is your feature request related to a problem? Please describe. PTM_CONTACT_DICT in the globals class within the contactMap.py, replaces the PTMs with a letter code. PTR is given Y which is also the code for TYR. MSE is given S which is used for SER as well.
5X94 (PTPN11) has a MSE at 171 and other PTPN11 structures (5X7B, 5DF6,6R5G) do not have this modification at the same position.
In the downstream analysis of the fasta and feature files, these sequences look like mutants and we do not know whether it is a modification or a mutation.
The above screenshot highlights the difference in red.