NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

PDB annotation file only reporting one chain #9

Closed knaegle closed 1 year ago

knaegle commented 1 year ago

Description

PDB annotation, via the generateStructureRefFile is returning only one of the unique chains when more than one protein chain is provided.

Screenshots

Desired behavior, example for 2OQ1

Files

To Reproduce

Steps to reproduce the behavior:

  1. Look at 2OQ1 in the file by: CoDAC.PDB.generateStructureRefFile(['2OQ1'], 'test.csv')

Expected behavior

It should produce two lines, one for chain A (uniprot id P43403) and one for Chain B (uniprot id P20963). It currently suggests that both A,B belong to P43403.

knaegle commented 1 year ago

Unsure where the behavior originates, but for certain on line 69, handling PDB_ID as a dictionary key when more than one annotation entry can be returned is an issue, which would select/allow only one storage item. Will need to generate new keys (e.g. PDBID_ENTITYID) to handle unique entries in a PDB as part of dictionary. Entity_ID is the way we originally handled this.

Screen Shot 2023-04-25 at 6 57 37 PM

See attached image and 1D4W as an example where there are two entities and two chains in each entity.

knaegle commented 1 year ago

Here's the current report. It's clear that it knows there are two entities, as there are two chain lengths, but there is only one sequence reported.

Screen Shot 2023-04-25 at 7 02 10 PM