OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.94k stars 334 forks source link

ab-initio adapter detection - incorrect adapter classification #240

Open cpreviti opened 4 years ago

cpreviti commented 4 years ago

Dear developers, When I check our RNAseq (we're expecting the: Illumina TruSeq Adapters for Read 1 and Read 2) data for adapters, fastp detects a mix of the correct Adapters but sometimes also the following adapters: Nextera_LMP_Read1_External_Adapter/Nextera_LMP_Read2_External_Adapter The difference between the Illumina Truseq and Nextera LMP Adapters is exactly 1 A at the beginning of the adapter sequence that is missing in the Nextera ones. The easiest solution is removing the LMP adapters from the list, since the protocol is not used anymore. But it may also be a bug...

Best regards, Christopher Previti

cpreviti commented 4 years ago

Just as example, in case I didn't explain myself well. This is the output of a ab-initio trimming test that I performed. What you (presumably) see is the adapter sequence that you get from your list as well as the ones that the program detects. The detection works perfectly fine! But, the sequence you detect is misclassified as an incorrect adapter (it's just a substring of the correctly detected sequence): "adapter_cutting": { "adapter_trimmed_reads": 15252489, "adapter_trimmed_bases": 275911139, "read1_adapter_sequence": "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA", "read2_adapter_sequence": "GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT", "read1_adapter_counts": {"A":258256, "AG":255461, "AGA":251408, "AGAT":252294, "AGATC":266313, "AGATCG":250919, "AGATCGG":246263, "AGATCGGA":244905, "AGATCGGAA":239915, "AGATCGGAAG":238869, "AGATCGGAAGA":232727, "AGATCGGAAGAG":228236, "AGATCGGAAGAGC":222700, "AGATCGGAAGAGCA":215512, "AGATCGGAAGAGCAC":209235, "AGATCGGAAGAGCACA":206374, "AGATCGGAAGAGCACAC":207388, "AGATCGGAAGAGCACACG":199146, "AGATCGGAAGAGCACACGT":189425, "AGATCGGAAGAGCACACGTC":180821, "AGATCGGAAGAGCACACGTCT":174541, "AGATCGGAAGAGCACACGTCTG":164492, "AGATCGGAAGAGCACACGTCTGA":155202, "AGATCGGAAGAGCACACGTCTGAA":147726, "AGATCGGAAGAGCACACGTCTGAAC":140842, "AGATCGGAAGAGCACACGTCTGAACT":132633, "AGATCGGAAGAGCACACGTCTGAACTC":127846, "AGATCGGAAGAGCACACGTCTGAACTCC":121901, "AGATCGGAAGAGCACACGTCTGAACTCCA":114267, "AGATCGGAAGAGCACACGTCTGAACTCCAG":106994, "AGATCGGAAGAGCACACGTCTGAACTCCAGT":112076, "AGATCGGAAGAGCACACGTCTGAACTCCAGTC":100202, "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA":83444, "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC":77601, "others":1284915}, "read2_adapter_counts": {"A":254429, "AG":251422, "AGA":248266, "AGAT":248779, "AGATC":244706, "AGATCG":245271, "AGATCGG":241971, "AGATCGGA":241349, "AGATCGGAA":236832, "AGATCGGAAG":235637, "AGATCGGAAGA":229437, "AGATCGGAAGAG":225478, "AGATCGGAAGAGC":219571, "AGATCGGAAGAGCG":212924, "AGATCGGAAGAGCGT":205831, "AGATCGGAAGAGCGTC":203689, "AGATCGGAAGAGCGTCG":205204, "AGATCGGAAGAGCGTCGT":195968, "AGATCGGAAGAGCGTCGTG":187239, "AGATCGGAAGAGCGTCGTGT":178116, "AGATCGGAAGAGCGTCGTGTA":172266, "AGATCGGAAGAGCGTCGTGTAG":162462, "AGATCGGAAGAGCGTCGTGTAGG":153343, "AGATCGGAAGAGCGTCGTGTAGGG":145902, "AGATCGGAAGAGCGTCGTGTAGGGA":139406, "AGATCGGAAGAGCGTCGTGTAGGGAA":131260, "AGATCGGAAGAGCGTCGTGTAGGGAAA":126247, "AGATCGGAAGAGCGTCGTGTAGGGAAAG":120275, "AGATCGGAAGAGCGTCGTGTAGGGAAAGA":112856, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAG":105885, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGT":98076, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG":91036, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT":82723, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTT":76530, "others":1381254}

Please let me know if I can help in any way! Best regards,

Christopher