biocore / microprot

structural annotation pipeline for microbial genomes and metagenomes
BSD 3-Clause "New" or "Revised" License
1 stars 6 forks source link

IMPROVEMENT: split_sequence non_match fragment length #35

Closed tkosciol closed 7 years ago

tkosciol commented 7 years ago

split_sequence takes probability, e-value, fragment length as criteria to group sub-sequences into "domains". I'd like to have fragment length as a hard requirement, i.e. if a subsequence is shorter than requested, it's not reported as non_match. Unlike other parameters (P or E-value) which help us identify if a sub-sequence fits our requirements for a domain (e.g. is a PDB domain) or not, fragment length is a requirement for "minimal interesting fragment". If the sequence is shorter than, say 10 residues, we're not interested in modelling it at all. Hope that makes sense.

tkosciol commented 7 years ago

please commit changes to split_search branch

tkosciol commented 7 years ago

@sjanssen2 is it just as easy as: from:

subseqs_neg = report_uncovered_subsequences(subseqs_pos, str(p),
                                                min_subseq_len=0)

to:

subseqs_neg = report_uncovered_subsequences(subseqs_pos, str(p),
                                                min_subseq_len=min_fragment_length)

🤔

sjanssen2 commented 7 years ago

yes it is that easy :-)