a-h-b / dadasnake

Amplicon sequencing workflow heavily using DADA2 and implemented in snakemake
GNU General Public License v3.0
45 stars 19 forks source link

dada_dadaReads.runpool.R fix #8

Closed vmikk closed 3 years ago

vmikk commented 3 years ago

Hello Anna!

This PR will fix an error

Error in rep(seq(length(uniques)), tab[tab > 0]) : 
  invalid 'times' argument

which appears when input file is empty (but .gz file has a non-zero size, e.g. 20 bytes) - see for example S253.fastq.gz (one of the samples failed and this file appeared after quality filtering with dadasnake). We do not need to count all lines in input files (they could be quite large), therefore I use head - this is just a slightly modified trick I've seen in your code here.

And if there are multiple input files in filt, the result of derepFastq would be a list and this line will be ignored. So we should modify quals in each slot of the list.

Probably the other scripts are affected as well (e.g., dada_dadaReads.runpool.noError.R, dada_dadaReads.pool.R). But I haven't touched them because I don't want to break something.

With kind regards, Vladimir

a-h-b commented 3 years ago

Hi Valdimir - thanks. I'll also fix the other scripts. -AHB