add test cases for maximum e-value filter on alignment results

Assertion: The maximum e-value for alignments in IDseq is 1.

Implementation Details: The maximum e-value threshold filter is applied in two different locations within the code base:

For short read alignments, the filter is applied inside the iterate_m8() function in the .m8 utils.
For contig alignments, the filter is applied using filters in PipelineStepBlastContigs.

We expect that there may be alignments with e-values > 1 in the initial alignment files (gsnap.m8, rapsearch2.m8, gsnap.blast.m8, rapsearch2.blast.m8). The filter is then applied to the raw .m8 results when parsing for the top hits. There should never be e-values > 1 in the following files:

gsnap.deduped.m8
rapsearch2.deduped.m8
gsnap.blast.top.m8
rapsearch2.blast.top.m8

This was implemented as part of https://github.com/chanzuckerberg/idseq-dag/pull/309

Test Sample: This was tested on staging using benchmark sample UnAmbiguouslyMapped_ds.gut. In particular: staging sample ID 19379 was run prior to the fix, staging sample ID 19361 was run after the fix.

For exampe, in sample 19361, gsnap.m8 has 32 rows with e-value > 1, but gsnap.deduped.m8 has zero. rapsearch2.m8 has 45 rows with e-value > 1, but rapsearch2.deduped.m8 has zero. rapsearch2.blast.m8 has 5172 rows with e-value > 1, but rapsearch2.blast.top.m8 has zero.

chanzuckerberg / idseq-workflows

add test cases for maximum e-value filter on alignment results #7