benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 141 forks source link

Memory error in AssignTaxonomy when working with very small sequences #1932

Open nearinj opened 2 months ago

nearinj commented 2 months ago

Hi Ben and team,

I just wanted to make note that currently in the assignTaxonomy function there is an error that results in a memory overflow if you input a a very small sequence. I noticed this because I accidentally was making synthetic reads and included a sequence that was only 4 base pairs long.

When I ran the command I kept getting out of memory errors and eventually was able to track it down to not filtering out very small reads. However, afterward I tested running the command with just my small 4 base pair long sequence and was able to reproduce the bug.

Well I know this isn't a common issue and probably isn't a priority I wanted to highlight it incase others run into it or if there is a simple fix for this in the future.

Thanks for continuing to support this software!

Cheers, Jacob Nearing

benjjneb commented 2 months ago

Thanks. Any sequence less than the kmer size (8) will break the code. There is a check for short sequences on the reference database size, but I guess there isn't one on the query sequence side. That should be added.