Arcadia-Science / ProteinCartography

a pipeline to build similarity maps of protein space
MIT License
26 stars 10 forks source link

Feature request for assessment of input .pdb quality #28

Open ecpierce opened 1 year ago

ecpierce commented 1 year ago

I recently ran the pipeline with a transcription factor as the input protein (Q96QS3). Looking at the results, there are not really any high quality hits. Most have TMscores <0.2. Looking back at the input .pdb file, the initial structure is pretty low quality, possibly because transcription factors are enriched with intrinsically disordered regions.

Takeaways: 1) Transcription factors may be challenging to analyze with this approach 2) It would be awesome if a feature was added to the pipeline to assess the quality of the input .pdb before running the rest of the steps, since very low quality inputs are unlikely to yield high quality results.

Thanks! Emily

mezarque commented 1 year ago

@braebigge and I went through and did some digging into how structure quality is embedded in the data. There's now a function in the pdb_tools.py script that extracts the structure quality info as a list (extract_structure_confidence). We'll calculate the structure quality for every protein in the dataset and provide it as a view.

mezarque commented 11 months ago

Assessing PDB quality for all hits and the input is now a default part of the pipeline as of #39 ! The pipeline won't die if the PDB quality is too low, but you'll be able to see the PDB quality as part of the view.

mezarque commented 11 months ago

Future: warn user after input of a low-quality PDB