MetaSUB-CAMP / camp_short-read-quality-control

Other
4 stars 3 forks source link

Module won't process reads in fastq format #13

Closed katkopera closed 1 year ago

katkopera commented 1 year ago

Hi, In theory module accepts FastQ files. From the documentation:

ingest_samples in workflow/utils.py expects Illumina reads in FastQ (may be gzipped) form

BUG When I use samples in .fastq format (not gzipped) Snakemake gives the error:

Failed to process /net/ascratch/people/plgkkopera/camp_tests/short-read-quality-control/tmp/ERR3472499_2.fastq.gz java.util.zip.ZipException: Not in GZIP format at java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165) at java.base/java.util.zip.GZIPInputStream.(GZIPInputStream.java:79) at java.base/java.util.zip.GZIPInputStream.(GZIPInputStream.java:91) at uk.ac.babraham.FastQC.Utilities.MultiMemberGZIPInputStream.(MultiMemberGZIPInputStream.java:37) at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:80) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:159) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.java:121) at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316) at java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165) at java.base/java.util.zip.GZIPInputStream.(GZIPInputStream.java:79) at java.base/java.util.zip.GZIPInputStream.(GZIPInputStream.java:91) at uk.ac.babraham.FastQC.Utilities.MultiMemberGZIPInputStream.(MultiMemberGZIPInputStream.java:37) at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:80) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106) at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:159) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.java:121) at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)

This error is not surprising at all as after inspecting the utils.py it's clear that ingest_samples() functions create tmp files with '.fastq.gz' regardless of file format provided.

Also it expects that read1 will be encoded as _1.fastq.gz and read2 as _1.fastq.gz. What if somebody distinguishes read1 and read2 in a different way for example: sample_R1.fastq.gz and sample_R2.fasq.gz. This will again lead to an error.

I don't need help debugging cuz after I gzip my files it works. Just reporting an issue with the code.

lauren-mak commented 1 year ago

Fixed in the latest push. Thanks for suggesting this!