FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
461 stars 150 forks source link

cutadapt: error: [Errno 2] No such file or directory: 'file.fastq\xa0' #117

Closed Hannah1746 closed 3 years ago

Hannah1746 commented 3 years ago

I am running trim_galore on a transcriptome. There is the code that I run: trim_galore --paired Gyrinocheilus_trancriptome.1_1.fastq Gyrinocheilus_trancriptome.1_2.fastq

This is what is written in the terminal: Multicore support not enabled. Proceeding with single-core trimming. Path to Cutadapt set as: 'cutadapt' (default) Cutadapt seems to be working fine (tested command 'cutadapt --version') Cutadapt version: 2.3 single-core operation. No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

AUTO-DETECTING ADAPTER TYPE

Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> Gyrinocheilus_trancriptome.1_1.fastq <<)

Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 187 AGATCGGAAGAGC 1000000 0.02 Nextera 9 CTGTCTCTTATA 1000000 0.00 smallRNA 2 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 187). Second best hit was Nextera (count: 9)

Writing report to 'Gyrinocheilus_trancriptome.1_1.fastq_trimming_report.txt'

SUMMARISING RUN PARAMETERS

Input filename: Gyrinocheilus_trancriptome.1_1.fastq Trimming mode: paired-end Trim Galore version: 0.6.6 Cutadapt version: 2.3 Number of cores used for trimming: 1 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp

Cutadapt seems to be fairly up-to-date (version 2.3). Setting -j 1 Writing final adapter and quality trimmed output to Gyrinocheilus_trancriptome.1_1_trimmed.fq

Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file Gyrinocheilus_trancriptome.1_1.fastq <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.3 with Python 3.7.0 Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC Gyrinocheilus_trancriptome.1_1.fastq Processing reads on 1 core in single-end mode ... Finished in 377.63 s (16 us/read; 3.79 M reads/minute).

=== Summary ===

Total reads processed: 23,866,065 Reads with adapters: 5,264,537 (22.1%) Reads written (passing filters): 23,866,065 (100.0%)

Total basepairs processed: 2,171,811,915 bp Quality-trimmed: 5,466,756 bp (0.3%) Total written (filtered): 2,157,634,932 bp (99.3%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 5264537 times.

No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters: A: 29.5% C: 26.9% G: 19.9% T: 23.7% none/other: 0.0%

Overview of removed sequences length count expect max.err error counts 1 2807456 5966516.2 0 2807456 2 2110687 1491629.1 0 2110687 3 200393 372907.3 0 200393 4 74034 93226.8 0 74034 5 51293 23306.7 0 51293 6 9812 5826.7 0 9812 7 884 1456.7 0 884 8 79 364.2 0 79 9 586 91.0 0 20 566 10 679 22.8 1 32 647 11 336 5.7 1 13 323 12 120 1.4 1 12 108 13 121 0.4 1 37 84 14 78 0.4 1 52 26 15 91 0.4 1 66 25 16 100 0.4 1 59 41 17 82 0.4 1 57 25 18 68 0.4 1 45 23 19 84 0.4 1 59 25 20 87 0.4 1 50 37 21 101 0.4 1 55 46 22 97 0.4 1 57 40 23 95 0.4 1 53 42 24 105 0.4 1 54 51 25 78 0.4 1 49 29 26 116 0.4 1 61 55 27 116 0.4 1 53 63 28 83 0.4 1 51 32 29 86 0.4 1 62 24 30 92 0.4 1 57 35 31 90 0.4 1 56 34 32 72 0.4 1 45 27 33 76 0.4 1 55 21 34 91 0.4 1 55 36 35 104 0.4 1 77 27 36 70 0.4 1 53 17 37 100 0.4 1 66 34 38 80 0.4 1 53 27 39 105 0.4 1 74 31 40 86 0.4 1 59 27 41 104 0.4 1 67 37 42 77 0.4 1 51 26 43 98 0.4 1 62 36 44 74 0.4 1 46 28 45 117 0.4 1 49 68 46 103 0.4 1 53 50 47 103 0.4 1 68 35 48 81 0.4 1 44 37 49 94 0.4 1 67 27 50 81 0.4 1 58 23 51 88 0.4 1 47 41 52 79 0.4 1 48 31 53 89 0.4 1 59 30 54 85 0.4 1 48 37 55 77 0.4 1 45 32 56 87 0.4 1 50 37 57 82 0.4 1 46 36 58 82 0.4 1 55 27 59 104 0.4 1 74 30 60 83 0.4 1 60 23 61 108 0.4 1 66 42 62 86 0.4 1 52 34 63 116 0.4 1 68 48 64 139 0.4 1 80 59 65 132 0.4 1 85 47 66 137 0.4 1 82 55 67 127 0.4 1 77 50 68 162 0.4 1 99 63 69 195 0.4 1 100 95 70 209 0.4 1 73 136 71 225 0.4 1 73 152 72 157 0.4 1 87 70 73 119 0.4 1 91 28 74 134 0.4 1 98 36 75 153 0.4 1 120 33 76 135 0.4 1 98 37 77 140 0.4 1 92 48 78 134 0.4 1 101 33 79 91 0.4 1 63 28 80 95 0.4 1 70 25 81 79 0.4 1 44 35 82 116 0.4 1 82 34 83 103 0.4 1 71 32 84 62 0.4 1 41 21 85 39 0.4 1 27 12 86 78 0.4 1 49 29 87 35 0.4 1 18 17 88 124 0.4 1 70 54 89 48 0.4 1 29 19 90 30 0.4 1 8 22 91 328 0.4 1 269 59

RUN STATISTICS FOR INPUT FILE: Gyrinocheilus_trancriptome.1_1.fastq

23866065 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Writing report to 'Gyrinocheilus_trancriptome.1_2.fastq _trimming_report.txt'

SUMMARISING RUN PARAMETERS

Input filename: Gyrinocheilus_trancriptome.1_2.fastq  Trimming mode: paired-end Trim Galore version: 0.6.6 Cutadapt version: 2.3 Number of cores used for trimming: 1 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp

Cutadapt seems to be fairly up-to-date (version 2.3). Setting -j -j 1 Writing final adapter and quality trimmed output to Gyrinocheilus_trancriptome.1_2.fastq _trimmed.fq

Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file Gyrinocheilus_trancriptome.1_2.fastq  <<< This is cutadapt 2.3 with Python 3.7.0 Command line parameters: -j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC Gyrinocheilus_trancriptome.1_2.fastq  Run "cutadapt --help" to see command-line options. See https://cutadapt.readthedocs.io/ for full documentation.

cutadapt: error: [Errno 2] No such file or directory: 'Gyrinocheilus_trancriptome.1_2.fastq\xa0'

Cutadapt terminated with exit signal: '512'. Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...

It seems like the first of the pair-reads runs fine but I don't understand why the second is failing.

Hannah1746 commented 3 years ago

Just re run the code and it is working fine now...

FelixKrueger commented 3 years ago

Good to hear. My guess would have been that you accidentally 'Gyrinocheilus_trancriptome.1_2.fastq\xa0' appended something weird to the end of the filename for R2. Let's just forget about it though :) Best of luck!