BimberLab / nimble

nimble — execute lightweight, flexible alignments on arbitrary reference libraries
MIT License
1 stars 1 forks source link

Implement a reference-library level option for 'max-hits-to-report' #51

Closed bbimber closed 2 years ago

bbimber commented 2 years ago

A given read could have multiple passing alignments against different references. We either report those of as a CSV string, or collapse to lineage and report those hits (if still ambiguous) as a CSV string. We should implement some type of max-hits-to-report flag in the library def (or other spot TBD). If a given final reference combination has more reference than this threshold, those reads are simply stored as not matching. Or we could make a new category for 'Multi-Ref' or something.

We should apply this filter after lineage-coalescing, if that option is being used.

Examples:

max-hits-to-report' = 5 raw hit = alleles A, B, C, D, G, H, R collapsing on lineage not selected initial call: A,B,C,D,G,H,R number of hits is 7, which is greater than 5, so this reference is filtered

max-hits-to-report' = 5 raw hit = Lineage1-AlleleA, Lineage1-AlleleB, Lineage1-AlleleC, Lineage2-AlleleD, Lineage2-AlleleG, Lineage2-AlleleF collapsing on lineage is used. initial call: Lineage1,Lineage2 number of hits is 2, so is allowed even though 7 raw alleles were matched