NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

Handling PTM notations in contactMap class #36

Closed alekhyaa2 closed 9 months ago

alekhyaa2 commented 9 months ago

Is your feature request related to a problem? Please describe. PTM_CONTACT_DICT in the globals class within the contactMap.py, replaces the PTMs with a letter code. PTR is given Y which is also the code for TYR. MSE is given S which is used for SER as well.

5X94 (PTPN11) has a MSE at 171 and other PTPN11 structures (5X7B, 5DF6,6R5G) do not have this modification at the same position.

In the downstream analysis of the fasta and feature files, these sequences look like mutants and we do not know whether it is a modification or a mutation.

Screen Shot 2023-09-12 at 9 00 20 PM

The above screenshot highlights the difference in red.

knaegle commented 9 months ago

Discussed this and found the issue is an incorrect dictionary w.r.t how we currently load globals. Screenshot 2023-09-21 at 2 24 39 PM

@alekhyaa2 instead we want to pull from the CIF file the PTM mapping of the modified 3-letter code to the amino acid. For example, that detail for 5x94 can be seen here.

alekhyaa2 commented 9 months ago

AdjacenyFiles.makePTM_dict can now generate a dict with all the pTMs present across PDB structures in the reference. Currently, we will use this as input to the contcatmapClass. Also, identified that there are residues with zero occupancy that currently report as new amino acids encountered while running contactmap. The cif files do not have a one letter code for such residues. Although, the PDB structures use a different letter to represent such residues.