bcgsc / biobloom

Create Bloom filters for a given reference and then use it to categorize sequences
http://www.bcgsc.ca/platform/bioinfo/software/biobloomtools
GNU General Public License v3.0
75 stars 15 forks source link

Revise --ordered option when paired matching #18

Closed JustinChu closed 7 years ago

JustinChu commented 7 years ago

--ordered does not play nice with the paired reads option since the greedy algorithm considers only 1 read at a time.

When it ends up matching multiple reads it may assign one read to 2 different filters causing a no-match to occur. This is generally not seen if filters being used are unrelated, but can be an issue if 2 filters have a substantial k-mer overlap.

JustinChu commented 7 years ago

revision 3de4d2e93675a7634bb877342826efa508919458 in dev branch fixes issues but caused the removal of any prescreening speed up.

Side note. -c may be broken in all versions, even for single ended mode, since std::multimap does not guarantee the order of elements inserted when they have the same key. This is now fixed with this revision.

Prescreening should be re-added but possibly with another option. The speed up and increased specificity seems to still be beneficial when working correctly.