NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

Issue with "gaps" recorded in PDB file and Adjacency File #28

Closed alekhyaa2 closed 9 months ago

alekhyaa2 commented 10 months ago

Is your feature request related to a problem? Please describe. A gap found in sequence alignment of 4U1P at position 98 (also shown in figure below). This gap is not recorded in the PDB metafile.

Screen Shot 2023-09-05 at 11 50 57 AM

Describe alternatives you've considered The PDB structure does not have this gap as observed in our sequence alignment. Found that the Adjacency file for 4U1P misrepresents 'K' as '15P' and doesnt not assign an amino acid letter to it and appends it to the unmodelled list.

Tasks

Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at.

alekhyaa2 commented 9 months ago

This is not an issue with gaps in the PDB reference file.

'15P' is a small molecule ligand in the structure. Updating the positions of the residues in the structure sequence to the uniprot reference positions, lead to an overlap in the small molecule ligand positions (these are not updated since there is no reference for this) and the updated reference positions. So, we end up having multiple residues with the same residue position number. In this case 'K' and '15P' both are given 201 position in the updated adjacency files.

To handle this issue,

  1. Updated the AdjacencyFile.py in such a way that we can enable including or excluding small molecule ligand interactions
  2. Also, allowing the user to decide if the residue positions need to be updated to match the uniprot reference or not.