NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

Inconsistent Gene name data recorded in PDB Metafile #26

Closed alekhyaa2 closed 1 month ago

alekhyaa2 commented 10 months ago

Is your feature request related to a problem? Please describe. Columns generated by PDB.py and IntegrateStructure_Reference.py do not output similar results.

'GENE_NAME' and 'gene name' columns as shown in the figure below. The main difference is that the ligand entities are recorded as '-1' in 'gene_name' column but the known gene name of the ligand molecule is recorded in 'GENE_NAME' column.

Screen Shot 2023-09-04 at 10 53 06 PM

Tasks

Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at.

knaegle commented 10 months ago

Documentation will clear this up as will changing the entry information. What is going on here is that ligands that are not in the SH2 domain reference file cannot be analyzed, hence the fields requiring integration are given a '-1' error code. Instead, let's update this to say something more informative like "NR" - for not in reference file.

knaegle commented 10 months ago

Update: Discussed and agreed that we will put Integration: in front of all fields coming from uniprot reference - this will also make it clear that this gene name is different than the other.