Arcadia-Science / ProteinCartography

a pipeline to build similarity maps of protein space
MIT License
30 stars 10 forks source link

Determine the query source of each PDB #15

Closed mezarque closed 1 year ago

mezarque commented 1 year ago

We have the ability to label, for each structure in the ultimate analysis, how we found the original structure (e.g. whether it showed up as a BLAST hit, a Foldseek hit, or both). This could be an interesting variable to visualize within the final plot.

We might also want to have a river plot or statistics file that is printed of the number of hits from BLAST/FoldSeek and the percentage of valid mappings we found to AlphaFold or PDB structures.

mezarque commented 1 year ago

This is now handled by the get_source.py script.