Users have observed that counting of read pairs in log files do not add up properly (lines following Reporting mapping statistics for paired end input). We expect $SSU_total_pairs to equal the sum of $ssu_pairs, %ssu_bad_pairs, and $mapped_half.
The regex in line 1194 of phyloFlash.pl does not cover all possible cases. Some read names include this pattern internally and will be wrongly split, those reads will not be counted correctly. Some libraries have spaces before 1 and 2 read segment suffix. Some do not have segment suffix at all. Adding line-ending $ to regex will not solve the problem either, because the 1 and 2 suffix may not be the last character of the read name.
Users have observed that counting of read pairs in log files do not add up properly (lines following
Reporting mapping statistics for paired end input
). We expect$SSU_total_pairs
to equal the sum of$ssu_pairs
,%ssu_bad_pairs
, and$mapped_half
.The regex in line 1194 of
phyloFlash.pl
does not cover all possible cases. Some read names include this pattern internally and will be wrongly split, those reads will not be counted correctly. Some libraries have spaces before1
and2
read segment suffix. Some do not have segment suffix at all. Adding line-ending$
to regex will not solve the problem either, because the1
and2
suffix may not be the last character of the read name.See https://github.com/HRGV/phyloFlash/commit/c50547f1437fb7abb3d0257d641cbbdc90ec58a7#commitcomment-89103251
The object
%qname_hash
does not appear to be used elsewhere, could likely be removed without problems.However similar regex is used in:
PhyloFlash.pm
lines 976, 1192, 1213 (introduced in 4973a5e34c4ceb0a78f06690012aaa835801f78b)Thanks to Yannick Colin for detailed report that helped us find this issue.