Closed claraqin closed 4 years ago
I got to the root of this issue: The filterAndTrim()
function which is used by our qualityFilterITS()
and qualityFilter16S()
functions has an argument called compressed
which is TRUE
by default. This compresses the files output from the quality filter, but it doesn't update the filename to end with ".gz". After filtering, the ITS sequences undergo primer trimming (trimPrimersITS()
), and this function expects files ending with ".gz" to be compressed and files without ".gz" to be uncompressed – it is getting confused by the mismatch in compression vs. filename.
Still need to fix this issue. It's being made tricky by the fact that the DADA pipelines currently rely on having constant filenames throughout the pipeline, and appending ".gz" would change those filenames.
Fixed in latest commit by gzipping all files as a final step in organizeRawSequenceData()
.
Per a conversation between @zoey-rw and I, the
.fastq
files that are reorganized into the0_raw
subfolders at the end of thedownload-neon-data.Rmd
vignette need to be turned into.fastq.gz
files, or else this error is reached during theprocess-[16s/its]-sequences.Rmd
vignette:The
.fastq
files are actually already compressed, so this can be addressed by include a line at the end of thedownload-neon-data.Rmd
vignette that just appends ".gz" to the end of each filename.