Comparison of vendor provided cutadapt command with the pipeline command

komaljain3 commented 1 year ago

Kristie inquired regarding the adapter trimming from the vendor (who provides the sequencing library kit) and the following information was shared with us:

Hello Kristine,

To trim adapters from reads we recommend using cutadapt, setting the minimum insert to 15 bases. Then align your libraries to a miRbase reference for the organism you are studying. I’ve also linked the instructions for v3 here for more detailed instructions. Trimming for v4 is the same just easier (skip step 2).

Best, Allyson

Allyson C. LeBas | Technical Services Support Rep NEXTFLEX NGS PerkinElmer | For the Better
Allyson.LeBas@PerkinElmer.com NGS@PerkinElmer.com Phone: +1 512.707.8993 Mobile: +1 512.708.9904 Fax: +1 512.707.8122 www.perkinelmer.com

Method1: The above email is linked to this doc:

https://perkinelmer-appliedgenomics.com/wp-content/uploads/marketing/NEXTFLEX/miRNA/NEXTflex_Small_RNA_v3_Trimming_Instructions.pdf

The instructions here include:

NEXTflex™ Small RNA Trimming Instructions Sequencing reads generated with the NEXTflex Small RNA Seq Kit v3 require trimming of adapter sequences and random bases prior to alignment when using and end-to-end alignment mode. These trimming steps can be accomplished using cutadapt (https://cutadapt.readthedocs.io/en/stable/), which is free to download.

The following commands require the latest version of cutadapt, which can be installed using the directions found at http://cutadapt.readthedocs.io/en/stable/installation.html.

Trim 3’ adapters: cutadapt -a TGGAATTCTCGGGTGCCAAGG -o YOUR_FILE.trim1.fq --minimum-length 23 YOUR_FILE.fastq.gz This command excludes any inserts less than 15 bases (where the minimum length of 23 refers to 8 randomized bases and a minimum 15-base insert). It takes as input “YOUR_FILE.fastq.gz” and writes out to “YOUR_FILE.trim1.fq”.
Use this command to trim 4 bases from either side of each read: cutadapt -u 4 -u -4 -o YOUR_FILE.trim2.fq YOUR_FILE.trim1.fq From here, you can align “YOUR_FILE.trim2.fq” to an appropriate reference using an aligner such as bowtie2.

Method2: Command used in the old CGR pipeline

cutadapt -b {params.adapter} -m 15 -M 31 --too-short-output={output[1]} --too-long-output={output[2]} -q 10,10 -o {output[0]} {input} 2>log/{wildcards.sample}_cutadapt.err

-m = minimum length (Discard processed reads that are shorter than LENGTH) -M = maximum length (Discard processed reads that are longer than LENGTH) -too short (Instead of discarding the reads that are too short according to -m, write them to FILE (in FASTA/FASTQ format) --too long (Instead of discarding reads that are too long (according to -M), write them to FILE (in FASTA/FASTQ format) -q 10, 10 (trim low-quality ends from reads)

The fasta file used for the second method was modified from:

>PrefixPE/1
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC
>PE1
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PE1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
>PE2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>PE2_rc
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>Head
CCCTACACGACGCTCTTCCGATCT
>Head_rc
AGATCGGAAGAGCGTCGTGTAGGG

to this new file that also includes the adapter provided by the vendor

>NEXTflex
TGGAATTCTCGGGTGCCAAGG
>NEXTflex_rc
CCTTGGCACCCGAGAATTCCA
>TruSeq_Adaptor_5p
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>TruSeq_Adaptor_5p_rc
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC
>PE1
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PE1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
>PE2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>PE2_rc
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>Head
CCCTACACGACGCTCTTCCGATCT
>Head_rc
AGATCGGAAGAGCGTCGTGTAGGG

The above two methods for trimming were compared and the results are below.

komaljain3 commented 1 year ago

Results from Method1:

Trimmed with NEXTflex cutadapt parameters

INPUT_FASTQ=/DCEG/CGF/Research/RD168_Chernobyl_TN-Pairs/ANALYSIS_miR/2023-05-30-miRNA-pipeline-test/sample_test_2018_data/Gencode_microRNA-seq/merged_fastq/SC299521_ATCACG.fastq.gz

SAMPLE=SC299521_ATCACG

export SINGULARITY_BINDPATH=/DCEG/CGF/Research/RD168_Chernobyl_TN-Pairs/ANALYSIS_miR/2023-05-30-miRNA-pipeline-test/sample_test_2018_data/Gencode_microRNA-seq/merged_fastq,/DCEG/CGF/Research/RD168_Chernobyl_TN-Pairs/ANALYSIS_miR/2023-05-30-miRNA-pipeline-test/cutadapt_trim_test

singularity pull docker://quay.io/biocontainers/cutadapt:4.4--py39hf95cd2a_1

singularity exec cutadapt_4.4--py39hf95cd2a_1.sif cutadapt -a TGGAATTCTCGGGTGCCAAGG -o ${SAMPLE}.trim1.fq --minimum-length 23 $INPUT_FASTQ

singularity exec cutadapt_4.4--py39hf95cd2a_1.sif cutadapt -u 4 -u -4 -o ${SAMPLE}.trim2.fq ${SAMPLE}.trim1.fq

Output

Command 1

This is cutadapt 4.4 with Python 3.9.16
Command line parameters: -a TGGAATTCTCGGGTGCCAAGG -o SC299521_ATCACG.trim1.fq --minimum-length 23 /DCEG/CGF/Research/RD168_Chernobyl_TN-Pairs/ANALYSIS_miR/2023-05-30-miRNA-pipeline-test/sample_test_2018_data/Gencode_microRNA-seq/merged_fastq/SC299521_ATCACG.fastq.gz
Processing single-end reads on 1 core ...
Done           00:01:30    28,407,041 reads @   3.2 µs/read;  18.75 M reads/minute
Finished in 91.523 s (3.222 µs/read; 18.62 M reads/minute).

=== Summary ===

Total reads processed:              28,407,041
Reads with adapters:                    28,386 (0.1%)

== Read fate breakdown ==
Reads that were too short:                   0 (0.0%)
Reads written (passing filters):    28,407,041 (100.0%)

Total basepairs processed: 1,008,769,132 bp
Total written (filtered):  1,008,657,361 bp (100.0%)

=== Adapter 1 ===

Sequence: TGGAATTCTCGGGTGCCAAGG; Type: regular 3'; Length: 21; Trimmed: 28386 times

Minimum overlap: 3
No. of allowed errors:
1-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Bases preceding removed adapters:
  A: 21.0%
  C: 17.2%
  G: 43.6%
  T: 18.2%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
3   14927   443860.0    0   14927
4   3335    110965.0    0   3335
5   8424    27741.3 0   8424
6   1260    6935.3  0   1260
7   111 1733.8  0   111
8   87  433.5   0   87
9   46  108.4   0   0 46
10  77  27.1    1   0 77
11  115 6.8 1   0 115
12  4   1.7 1   0 4

Command 2

This is cutadapt 4.4 with Python 3.9.16
Command line parameters: -u 4 -u -4 -o SC299521_ATCACG.trim2.fq SC299521_ATCACG.trim1.fq
Processing single-end reads on 1 core ...
Done           00:01:20    28,407,041 reads @   2.8 µs/read;  21.21 M reads/minute
Finished in 80.966 s (2.850 µs/read; 21.05 M reads/minute).

=== Summary ===

Total reads processed:              28,407,041
Reads written (passing filters):    28,407,041 (100.0%)

Total basepairs processed: 1,008,657,361 bp
Total written (filtered):    781,401,033 bp (77.5%)

komaljain3 commented 1 year ago

Results from Method2:

This is cutadapt 4.4 with Python 3.9.16 Command line parameters: -b file:adapters.fa -m 15 -M 31 --too-short-output=trimmed/SC299521_ATCACG_too_short.fastq.gz --too-long-output=trimmed/SC299521_ATCACG_too_long.fastq.gz -q 10,10 -o trimmed/SC299521_ATCACG.trim.fastq.gz /DCEG/CGF/Research/RD168_Chernobyl_TN-Pairs/ANALYSIS_miR/2023-05-30-miRNA-pipeline-test/sample_test_2018_data/Gencode_microRNA-seq/merged_fastq/SC299521_ATCACG.fastq.gz --cores=10 Processing single-end reads on 10 cores ... Finished in 43.634 s (1.536 µs/read; 39.06 M reads/minute).

=== Summary ===

Total reads processed: 28,407,041 Reads with adapters: 2,981,126 (10.5%)

== Read fate breakdown == Reads that were too short: 10,094,132 (35.5%) Reads that were too long: 2,474,224 (8.7%) Reads written (passing filters): 15,838,685 (55.8%)

Total basepairs processed: 1,008,769,132 bp Quality-trimmed: 519,656,268 bp (51.5%) Total written (filtered): 380,479,308 bp (37.7%)

=== Adapter NEXTflex3 ===

Sequence: TGGAATTCTCGGGTGCCAAGG; Type: variable 5'/3'; Length: 21; Trimmed: 580802 times 37620 times, it overlapped the 5' end of a read 543182 times, it overlapped the 3' end or was within the read

Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Overview of removed sequences (5') length count expect max.err error counts 3 21746 443860.0 0 21746 4 14263 110965.0 0 14263 5 839 27741.3 0 839 6 151 6935.3 0 151 7 604 1733.8 0 604 8 6 433.5 0 6 9 3 108.4 0 0 3 10 4 27.1 1 0 4 11 2 6.8 1 0 2 12 2 1.7 1 0 2

Overview of removed sequences (3' or within) length count expect max.err error counts 3 363601 443860.0 0 363601 4 82119 110965.0 0 82119 5 17686 27741.3 0 17686 6 3307 6935.3 0 3307 7 189 1733.8 0 189 8 85 433.5 0 85 9 79 108.4 0 23 56 10 28643 27.1 1 80 28563 11 40516 6.8 1 90 40426 12 6837 1.7 1 12 6825 13 120 0.4 1 0 120

=== Adapter NEXTflex3_rc ===

Sequence: CCTTGGCACCCGAGAATTCCA; Type: variable 5'/3'; Length: 21; Trimmed: 326632 times 103419 times, it overlapped the 5' end of a read 223213 times, it overlapped the 3' end or was within the read

Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Overview of removed sequences (5') length count expect max.err error counts 3 34909 443860.0 0 34909 4 30406 110965.0 0 30406 5 37870 27741.3 0 37870 6 197 6935.3 0 197 7 9 1733.8 0 9 9 10 108.4 0 2 8 10 9 27.1 1 0 9 11 7 6.8 1 0 7 12 2 1.7 1 0 2

Overview of removed sequences (3' or within) length count expect max.err error counts 3 216841 443860.0 0 216841 4 5102 110965.0 0 5102 5 775 27741.3 0 775 6 455 6935.3 0 455 7 20 1733.8 0 20 8 1 433.5 0 1 9 6 108.4 0 3 3 10 10 27.1 1 0 10 11 3 6.8 1 0 3

=== Adapter PE1 ===

Sequence: ACACTCTTTCCCTACACGACGCTCTTCCGATCT; Type: variable 5'/3'; Length: 33; Trimmed: 143891 times 52868 times, it overlapped the 5' end of a read 91023 times, it overlapped the 3' end or was within the read

Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-33 bp: 3

Overview of removed sequences (5') length count expect max.err error counts 3 47060 443860.0 0 47060 4 5322 110965.0 0 5322 5 312 27741.3 0 312 6 107 6935.3 0 107 7 26 1733.8 0 26 8 7 433.5 0 7 9 4 108.4 0 1 3 10 9 27.1 1 2 7 11 3 6.8 1 1 2 12 5 1.7 1 4 1 14 2 0.1 1 1 1 15 6 0.0 1 6 16 1 0.0 1 1 20 1 0.0 2 0 1 22 2 0.0 2 2 30 1 0.0 3 0 0 1

Overview of removed sequences (3' or within) length count expect max.err error counts 3 83420 443860.0 0 83420 4 3245 110965.0 0 3245 5 3804 27741.3 0 3804 6 418 6935.3 0 418 7 104 1733.8 0 104 8 11 433.5 0 11 9 13 108.4 0 1 12 10 5 27.1 1 0 5 11 1 6.8 1 0 1 12 2 1.7 1 0 2

=== Adapter PE1_rc ===

Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT; Type: variable 5'/3'; Length: 33; Trimmed: 759337 times 612296 times, it overlapped the 5' end of a read 147041 times, it overlapped the 3' end or was within the read

Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-33 bp: 3

Overview of removed sequences (5') length count expect max.err error counts 3 608878 443860.0 0 608878 4 2680 110965.0 0 2680 5 563 27741.3 0 563 6 121 6935.3 0 121 7 30 1733.8 0 30 8 1 433.5 0 1 9 6 108.4 0 3 3 10 14 27.1 1 1 13 11 3 6.8 1 0 3

Overview of removed sequences (3' or within) length count expect max.err error counts 3 136604 443860.0 0 136604 4 7118 110965.0 0 7118 5 2667 27741.3 0 2667 6 286 6935.3 0 286 7 17 1733.8 0 17 8 4 433.5 0 4 9 19 108.4 0 1 18 10 51 27.1 1 5 46 11 110 6.8 1 10 100 12 5 1.7 1 0 5 13 12 0.4 1 0 12 14 125 0.1 1 0 125 15 3 0.0 1 0 3 16 6 0.0 1 0 6 17 3 0.0 1 0 3 18 2 0.0 1 0 2 19 9 0.0 1 0 2 7

=== Adapter PE2 ===

Sequence: GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT; Type: variable 5'/3'; Length: 34; Trimmed: 796030 times 1 times, it overlapped the 5' end of a read 796029 times, it overlapped the 3' end or was within the read

Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3

Overview of removed sequences (5') length count expect max.err error counts 15 1 0.0 1 1

Overview of removed sequences (3' or within) length count expect max.err error counts 3 557600 443860.0 0 557600 4 236288 110965.0 0 236288 5 1315 27741.3 0 1315 6 394 6935.3 0 394 7 58 1733.8 0 58 8 9 433.5 0 9 9 165 108.4 0 2 163 10 125 27.1 1 0 125 11 64 6.8 1 0 64 12 9 1.7 1 1 8 13 2 0.4 1 0 2

=== Adapter PE2_rc ===

Sequence: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC; Type: variable 5'/3'; Length: 34; Trimmed: 190109 times 186531 times, it overlapped the 5' end of a read 3578 times, it overlapped the 3' end or was within the read

Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp: 3

Overview of removed sequences (5') length count expect max.err error counts 3 100883 443860.0 0 100883 4 71555 110965.0 0 71555 5 388 27741.3 0 388 6 148 6935.3 0 148 7 10 1733.8 0 10 8 12 433.5 0 12 9 12 108.4 0 1 11 10 62 27.1 1 0 62 11 6 6.8 1 0 6 22 5 0.0 2 0 4 1 27 13 0.0 2 0 1 12 28 134 0.0 2 16 81 37 29 13216 0.0 2 9517 3168 531 30 72 0.0 3 10 44 18 32 1 0.0 3 0 0 0 1 33 14 0.0 3 0 0 0 14

Overview of removed sequences (3' or within) length count expect max.err error counts 14 24 0.1 1 3 21 15 17 0.0 1 2 15 16 325 0.0 1 6 319 17 22 0.0 1 4 18 18 435 0.0 1 2 433 19 781 0.0 1 56 724 1 20 96 0.0 2 44 14 38 21 81 0.0 2 8 6 67 22 137 0.0 2 1 87 49 23 37 0.0 2 4 0 33 24 23 0.0 2 8 1 14 25 35 0.0 2 8 20 7 26 95 0.0 2 6 11 78 27 256 0.0 2 1 16 239 28 258 0.0 2 0 0 258 29 880 0.0 2 0 0 880 30 48 0.0 3 0 2 28 18 31 17 0.0 3 0 0 1 16 32 5 0.0 3 0 0 0 5 33 6 0.0 3 0 0 1 5

=== Adapter Head ===

Sequence: CCCTACACGACGCTCTTCCGATCT; Type: variable 5'/3'; Length: 24; Trimmed: 59864 times 0 times, it overlapped the 5' end of a read 59864 times, it overlapped the 3' end or was within the read

Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-24 bp: 2

Overview of removed sequences (5') length count expect max.err error counts

Overview of removed sequences (3' or within) length count expect max.err error counts 3 55406 443860.0 0 55406 4 3872 110965.0 0 3872 5 539 27741.3 0 539 6 32 6935.3 0 32 7 4 1733.8 0 4 8 2 433.5 0 2 10 3 27.1 1 0 3 11 4 6.8 1 0 4 12 1 1.7 1 0 1 45 1 0.0 2 0 1

=== Adapter Head_rc ===

Sequence: AGATCGGAAGAGCGTCGTGTAGGG; Type: variable 5'/3'; Length: 24; Trimmed: 124461 times 124461 times, it overlapped the 5' end of a read 0 times, it overlapped the 3' end or was within the read

Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-24 bp: 2

Overview of removed sequences (5') length count expect max.err error counts 3 109992 443860.0 0 109992 4 6932 110965.0 0 6932 5 738 27741.3 0 738 6 131 6935.3 0 131 7 6647 1733.8 0 6647 8 6 433.5 0 6 9 4 108.4 0 0 4 10 3 27.1 1 1 2 11 1 6.8 1 0 1 12 7 1.7 1 0 7

Overview of removed sequences (3' or within) length count expect max.err error counts

komaljain3 commented 1 year ago

Main differences in the CGR pipeline and the NEXTflex command:

1) CGR command has a minimum length filter of -m 15 and NEXTflex has -m 23 2) CGR command has a maximum length filter of -m 31 and NEXTflex has none 3) CGR command has a -q 10,10 to trim low quality based from both ends and NEXTflex has none 4) CGR command looks for a bunch of other adapters in addition to NEXTflex

Results summary:

The results of this analysis are present in slurm_out files present here: /DCEG/CGF/Research/RD168_Chernobyl_TN-Pairs/ANALYSIS_miR/2023-05-30-miRNA-pipeline-test/cutadapt_trim_test/cgr_cutadapt_cmd 1) When CGR command was evaluated it mainly trimmed NEXTflex adapter in the forward orientation. This is in line with the expected behavior.

2) The number of adapters trimmed by the vendor command resulted in trimming of 0.1% of the reads. The CGR command resulted in trimming of 10.5 % of the reads. This was further evaluated and optimized.

Because in CGR pipeline, we are providing a large number of adapters, we have to be aware of the minimum overlap length which is 3 by default. This was causing a lot of random trimming of 3-4 bases at the 3' end of the read. Not all of these adapters can be present on the reads and it needs optimization. The -O 5 option of Cutadapt was implemented to remove the random trimming from the 3' end. Adding this option, resulted in 0.6% of the reads being trimmed by eliminating random trimming.

-b

=== Summary ===

Total reads processed: 28,407,041 Reads with adapters: 2,981,126 (10.5%)

== Read fate breakdown == Reads that were too short: 10,094,132 (35.5%) Reads that were too long: 2,474,224 (8.7%) Reads written (passing filters): 15,838,685 (55.8%)

Total basepairs processed: 1,008,769,132 bp Quality-trimmed: 519,656,268 bp (51.5%) Total written (filtered): 380,479,308 bp (37.7%)

-b, -O 5 (remove random trimming)

=== Summary ===

Total reads processed: 28,407,041 Reads with adapters: 175,284 (0.6%)

== Read fate breakdown == Reads that were too short: 10,094,132 (35.5%) Reads that were too long: 2,847,285 (10.0%) Reads written (passing filters): 15,465,624 (54.4%)

Total basepairs processed: 1,008,769,132 bp Quality-trimmed: 519,656,268 bp (51.5%) Total written (filtered): 376,076,296 bp (37.3%)

-b (both 5' and 3' end trim) was compared with -a (3' end trim only)

New options: -a, -O 5

=== Summary ===

Total reads processed: 28,407,041 Reads with adapters: 112,757 (0.4%)

== Read fate breakdown == Reads that were too short: 10,094,098 (35.5%) Reads that were too long: 2,861,464 (10.1%) Reads written (passing filters): 15,451,479 (54.4%)

Total basepairs processed: 1,008,769,132 bp Quality-trimmed: 519,656,268 bp (51.5%) Total written (filtered): 375,996,503 bp (37.3%)

There is not much difference in behavior between -a and -b. Because we don't expect adapters on the 5' end, -a is not trimming much from 5' end.

The number of too short reads are quite high filtered out during our testing and it seemed to be due to the -q 10,10 option of cutadapt that trims the read ends that have a quality of <10. This results in shortening of reads and removal while using -m option. When this option was removed, not too many reads were filtered out from being too short. However, we kept this option for two reasons:

1) Consistency with the old pipeline 2) If the read end have poor quality, it is better to trim them.

Another point to note is that when -q is removed -M (limit on the long reads) should be adjusted or removed because the reads are not short anymore and get removed due to being too long. The example is below:

no -q with -M 31 Command line parameters: -a file:adapters.fa -m 15 -M 31 -O 5 --too-short-output=trimmed/SC299521_ATCACG_O5_A_noq_too_short.fastq.gz --too-long-output=trimmed/SC299521_ATCACG_O5_A_noq_too_long.fastq.gz -o trimmed/SC299521_ATCACG_O5_A_noq.trim.fastq.gz /DCEG/CGF/Research/RD168_Chernobyl_TN-Pairs/ANALYSIS_miR/2023-05-30-miRNA-pipeline-test/sample_test_2018_data/Gencode_microRNA-seq/merged_fastq/SC299521_ATCACG.fastq.gz --cores=10 Processing single-end reads on 10 cores ... Finished in 29.965 s (1.055 µs/read; 56.88 M reads/minute).

=== Summary ===

Total reads processed: 28,407,041 Reads with adapters: 18,218 (0.1%)

== Read fate breakdown == Reads that were too short: 14 (0.0%) Reads that were too long: 28,404,378 (100.0%) Reads written (passing filters): 2,649 (0.0%)

Total basepairs processed: 1,008,769,132 bp Total written (filtered): 65,319 bp (0.0%)

no -q and no -M 31

Command line parameters: -a file:adapters.fa -m 15 -O 5 --too-short-output=trimmed/SC299521_ATCACG_O5_A_noq_noupper_too_short.fastq.gz --too-long-output=trimmed/SC299521_ATCACG_O5_A_noq_noupper_too_long.fastq.gz -o trimmed/SC299521_ATCACG_O5_A_noq_noupper.trim.fastq.gz /DCEG/CGF/Research/RD168_Chernobyl_TN-Pairs/ANALYSIS_miR/2023-05-30-miRNA-pipeline-test/sample_test_2018_data/Gencode_microRNA-seq/merged_fastq/SC299521_ATCACG.fastq.gz --cores=10 Processing single-end reads on 10 cores ... Finished in 30.207 s (1.063 µs/read; 56.42 M reads/minute).

=== Summary ===

Total reads processed: 28,407,041 Reads with adapters: 18,218 (0.1%)

== Read fate breakdown == Reads that were too short: 14 (0.0%) Reads written (passing filters): 28,407,027 (100.0%)

Total basepairs processed: 1,008,769,132 bp Total written (filtered): 1,008,608,364 bp (100.0%)

Conclusion: the final command of cutadapt is as follows:

cutadapt -a file:${ADAPTERS} -m 15 -M 31 -O 5 \ --too-short-output=trimmed/${SAMPLE}_too_short.fastq.gz \ --too-long-output=trimmed/${SAMPLE}_too_long.fastq.gz \ -q 10,10 \ -o trimmed/${SAMPLE}.trim.fastq.gz ${INPUT_FASTQ} \ --cores=$SLURM_CPUS_PER_TASK \ 2>log/${SAMPLE}_cutadapt.err

NCI-CGR / Gencode_microRNA-seq