akcorut / kGWASflow

kGWASflow is a Snakemake workflow for performing k-mers-based GWAS.
https://github.com/akcorut/kGWASflow/wiki
MIT License
28 stars 8 forks source link

Questionable FASTQ format in test dataset #29

Open VanOverbeeke opened 7 months ago

VanOverbeeke commented 7 months ago

Hi,

I ran into an error doing the test:

Started analysis of individual_99_R1.fastq
Failed to process file individual_99_R1.fastq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Midline 'CCCFFFFFHHHHHJJJHGIJJJIJJGJGIJJJJIGIJJJIJJIFHGIIJIGJJFHEHI=DGGEEHHFFDFFFDEDEEDDBDBEFEEEEDDDDDDDDDDDDDDBDDDDDDCDDDDDDDDDDDDDADDDDDDBDDDDDDDCDDDDEDDEDEDE' didn't start with '+'
        at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:172)
        at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
        at java.base/java.lang.Thread.run(Thread.java:1583)

When I inspected the FASTQ files manually, I saw mismatching line counts (not all files contain a multiple of 4 lines), leading to unexpected inputs for the pipeline as seen in the error message. See the 1018 line count of this file at https://github.com/akcorut/kGWASflow/blob/main/.test/data/test_reads/individual_99_R1.fastq.