biocore / microprot

structural annotation pipeline for microbial genomes and metagenomes
BSD 3-Clause "New" or "Revised" License
1 stars 6 forks source link

PDB match: parser #3

Closed tkosciol closed 7 years ago

tkosciol commented 7 years ago

write parser for pHMMer to identify fragments of input sequence matching PDB.

tkosciol commented 7 years ago

Wait until I generate sample result files

tkosciol commented 7 years ago

Sample results are on Barnacle in /projects/microprot/benchmarking/pdb_search.

Current way to proceed: 1) read 1st result from .out file and see if it's below the E-value threshold (default=0.001). 2) Find residue range covered by the hit. 3) Go down the list as long as results are below E-val threshold. If there's a non overlapping hit with the first result add it to the results list.

return: 1) list of PDB ids and their corresponding query coverage 2) all sequences (n >= 40 residues) not covered by the PDB

sjanssen2 commented 7 years ago

Hey Tomasz, what do you mean with "query coverage"? A float number i.e. the percentage of the query sequence length covered by a hit, or the matching sub-sequence between hit and query?

tkosciol commented 7 years ago

matching sub-sequence between hit and query, e.g. (1abcA, 1-100), meaning structure 1abcA matches our query between residues 1 and 100.

sjanssen2 commented 7 years ago

please see PR #19

tkosciol commented 7 years ago

@sjanssen2 we can close this issue, right?

sjanssen2 commented 7 years ago

jepp. Done :-)