GenomicsAotearoa / metagenomics_summer_school

Course materials for the Genomics Aotearoa Metagenomics Summer School, to be hosted at the University of Auckland in Septermber
https://genomicsaotearoa.github.io/metagenomics_summer_school/
GNU General Public License v3.0
53 stars 30 forks source link

Trimmomatic #80

Closed anniewest closed 2 months ago

anniewest commented 3 months ago

Operation killed before trimming is complete (with default jupyter session settings):

trimmomatic PE -threads 4 -phred33 \ mock_R1.adapter_decay.fastq.gz mock_R2.adapter_decay.fastq.gz \ mock_R1.qc.fastq.gz mock_s1.qc.fastq.gz mock_R2.qc.fastq.gz mock_s2.qc.fastq.gz \ ILLUMINACLIP:NexteraPE-PE.fa:1:25:7 SLIDINGWINDOW:4:30 MINLEN:80 \

TrimmomaticPE: Started with arguments: -threads 4 -phred33 mock_R1.adapter_decay.fastq.gz mock_R2.adapter_decay.fastq.gz mock_R1.qc.fastq.gz mock_s1.qc.fastq.gz mock_R2.qc.fastq.gz mock_s2.qc.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:1:25:7 SLIDINGWINDOW:4:30 MINLEN:80 Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA' ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences /opt/nesi/CS400_centos7_bdw/Trimmomatic/0.39-Java-1.8.0_144/trimmomatic: line 2: 16396 Killed java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.39.jar "$@"

Running fastqc on truncated file:

fastqc mock_R1.qc.fastq.gz

application/gzip Started analysis of mock_R1.qc.fastq.gz Approx 5% complete for mock_R1.qc.fastq.gz Approx 10% complete for mock_R1.qc.fastq.gz Approx 15% complete for mock_R1.qc.fastq.gz Approx 20% complete for mock_R1.qc.fastq.gz Approx 25% complete for mock_R1.qc.fastq.gz Approx 30% complete for mock_R1.qc.fastq.gz Approx 35% complete for mock_R1.qc.fastq.gz Approx 40% complete for mock_R1.qc.fastq.gz Approx 45% complete for mock_R1.qc.fastq.gz Approx 50% complete for mock_R1.qc.fastq.gz Approx 55% complete for mock_R1.qc.fastq.gz Approx 60% complete for mock_R1.qc.fastq.gz Approx 65% complete for mock_R1.qc.fastq.gz Approx 70% complete for mock_R1.qc.fastq.gz Approx 75% complete for mock_R1.qc.fastq.gz Approx 80% complete for mock_R1.qc.fastq.gz Approx 85% complete for mock_R1.qc.fastq.gz Approx 90% complete for mock_R1.qc.fastq.gz Approx 95% complete for mock_R1.qc.fastq.gz Failed to process file mock_R1.qc.fastq.gz uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry. Your file is probably truncated at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:187) at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:129) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77) at java.lang.Thread.run(Thread.java:748)

DininduSenanayake commented 3 months ago

I have copied the block from material, ran it on a 4C/8G session and didn't manage to reproduce this error

$ trimmomatic PE -threads 4 -phred33 \
>                mock_R1.adapter_decay.fastq.gz mock_R2.adapter_decay.fastq.gz \
>                mock_R1.qc.fastq.gz mock_s1.qc.fastq.gz mock_R2.qc.fastq.gz mock_s2.qc.fastq.gz \
>                ILLUMINACLIP:NexteraPE-PE.fa:1:25:7 SLIDINGWINDOW:4:30 MINLEN:80
TrimmomaticPE: Started with arguments:
 -threads 4 -phred33 mock_R1.adapter_decay.fastq.gz mock_R2.adapter_decay.fastq.gz mock_R1.qc.fastq.gz mock_s1.qc.fastq.gz mock_R2.qc.fastq.gz mock_s2.qc.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:1:25:7 SLIDINGWINDOW:4:30 MINLEN:80
Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 2000000 Both Surviving: 1160013 (58.00%) Forward Only Surviving: 340638 (17.03%) Reverse Only Surviving: 124184 (6.21%) Dropped: 375165 (18.76%)
TrimmomaticPE: Completed successfully
mlhoggard commented 3 months ago

Sounds like maybe just a connection drop out while running the trimmomatic step? (That step is run in the terminal rather than as a slurm job...)

DininduSenanayake commented 2 months ago

I think it is still related to background CPUs and -threads as oversubscribing the latter triggered this few times. Looks like a bug with this particular version of Trimmomatic