OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.82k stars 334 forks source link

Not detecting the adapter in miRNAseq #129

Open ndaniel opened 5 years ago

ndaniel commented 5 years ago

It looks like FASTP is not able to detect automatically the adapter at all for miRNA-seq data.

For example, FASTP is not able to detect automatically the adapter in the SE FASTQ file from https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR5087522

FASTP v0.19.6 was run as fastp -i SRR5087522.fq -o test.fq.

The first 3 input reads look like this:

@SRR5087522.1
TGTAACAGCAACTCCATGTGGAATGGAATTCTCGGGTGCCAAGAACTCCA
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJ
@SRR5087522.2
NAGCTTATCAGACTGATGTTGACTGGAATTCTCGGGTGCCAAGGAACTCC
+
#4BDFFFFHHHHHJJJJJJJJJJJJIJJJJIJIJCBHIJJJJJJJJJJJJ
@SRR5087522.3
NCCCGGCGGCTGGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCGTACG
+
#1=DDFFFHHHHHJJJJJJJJIIJFGHHIJJIEHHHACDFFFFFEDADDD

FASTQC shows that this fastq file contains most likely the Illumina SmallRNA adapter 3', which according to FASTQC's database of adapters https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt is this ATCTCGTATGCCGTCTTCTGCTTG.

According to Illumina official document: https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/experiment-design/illumina-adapter-sequences-1000000002694-09.pdf these are all the Illumina small RNA adapters:

>Illumina Small RNA v1.5 3p Adapter
ATCTCGTATGCCGTCTTCTGCTTG
>Illumina RNA 3p Adapter (RA3)
TGGAATTCTCGGGTGCCAAGG
>Illumina RNA 5p Adapter (RA5)
GTTCAGAGTTCTACAGTCCGACGATC
>Illumina 5p RNA Adapter
GTTCAGAGTTCTACAGTCCGACGATC
>Illumina 3p RNA Adapter
TCGTATGCCGTCTTCTGCTTGT
sfchen commented 5 years ago

I will update the adapter detecting feature in next release, please help to test it then.

Thanks

ndaniel commented 5 years ago

Ok.

harish0201 commented 3 years ago

Hey, just wanted to let you know that this doesn't work at times.

I think it would be better that we introduce something like an error rate for matching adapters similar to Atropos. I think they implemented that to overcome the systemic biases and 3' error rates.

Currently, I'm parsing dnapi.py results and feeding it off to fastp for correction.

mdtorohernando commented 5 months ago

Hi! I was trying FastP with smallRNA data and efectively, FastP does not detect these adapters.

mdtorohernando commented 5 months ago

Indeed, I'm trying to include a fasta file with the adapters to check... and I obtain this error:

ERROR: the adapter can only have bases in {A, T, C, G}, but the given sequence is: adapters_miRNAs_illumima.fasta

I attach to you the FASTA file

image