NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

Error generating consistent SH2 domain sequences across PDB structures of same gene #24

Closed alekhyaa2 closed 1 month ago

alekhyaa2 commented 10 months ago

Is your feature request related to a problem? Please describe. Inconsistent lengths of SH2 domain fasta sequences that are generated across available PDB structures for genes ABL1 and HCK.

  1. For ABL1 - The "CANNONICAL_SEQ_BEG_POSITION" does not match the start position in the "reference range" column (PDB_reference metafile) and this leads to shifted start positions of the sequences. The attached snapshot shows the difference in start positions for PDBs - 1OPL, 6AMV, 6AMW. The other issue for the sequence difference comes from the SH2 domain boundary. The structures with errors (1OPL, 6AMV, 6AMW) have a different SH2 domain boundary (123-215) and the other ABL1 structures run from (125-217).
Screen Shot 2023-08-31 at 10 26 57 AM
  1. For HCK - Though the "reference ranges" start position matches the "CANNONICAL_SEQ_BEG_POSITION", there is still a shift by one amino acid for PDB structures (2HCK, 1AD5, 1QCF).

Describe alternatives you've considered Tried manually changing (for ABL1 structures) the start position in the "reference range" column but by doing so this does not allow the contactmap class to assign a refseq since the lengths of structseq and refseq are not equal.

Tasks

Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at.