Open cpreviti opened 4 years ago
Just as example, in case I didn't explain myself well. This is the output of a ab-initio trimming test that I performed. What you (presumably) see is the adapter sequence that you get from your list as well as the ones that the program detects. The detection works perfectly fine! But, the sequence you detect is misclassified as an incorrect adapter (it's just a substring of the correctly detected sequence): "adapter_cutting": { "adapter_trimmed_reads": 15252489, "adapter_trimmed_bases": 275911139, "read1_adapter_sequence": "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA", "read2_adapter_sequence": "GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT", "read1_adapter_counts": {"A":258256, "AG":255461, "AGA":251408, "AGAT":252294, "AGATC":266313, "AGATCG":250919, "AGATCGG":246263, "AGATCGGA":244905, "AGATCGGAA":239915, "AGATCGGAAG":238869, "AGATCGGAAGA":232727, "AGATCGGAAGAG":228236, "AGATCGGAAGAGC":222700, "AGATCGGAAGAGCA":215512, "AGATCGGAAGAGCAC":209235, "AGATCGGAAGAGCACA":206374, "AGATCGGAAGAGCACAC":207388, "AGATCGGAAGAGCACACG":199146, "AGATCGGAAGAGCACACGT":189425, "AGATCGGAAGAGCACACGTC":180821, "AGATCGGAAGAGCACACGTCT":174541, "AGATCGGAAGAGCACACGTCTG":164492, "AGATCGGAAGAGCACACGTCTGA":155202, "AGATCGGAAGAGCACACGTCTGAA":147726, "AGATCGGAAGAGCACACGTCTGAAC":140842, "AGATCGGAAGAGCACACGTCTGAACT":132633, "AGATCGGAAGAGCACACGTCTGAACTC":127846, "AGATCGGAAGAGCACACGTCTGAACTCC":121901, "AGATCGGAAGAGCACACGTCTGAACTCCA":114267, "AGATCGGAAGAGCACACGTCTGAACTCCAG":106994, "AGATCGGAAGAGCACACGTCTGAACTCCAGT":112076, "AGATCGGAAGAGCACACGTCTGAACTCCAGTC":100202, "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA":83444, "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC":77601, "others":1284915}, "read2_adapter_counts": {"A":254429, "AG":251422, "AGA":248266, "AGAT":248779, "AGATC":244706, "AGATCG":245271, "AGATCGG":241971, "AGATCGGA":241349, "AGATCGGAA":236832, "AGATCGGAAG":235637, "AGATCGGAAGA":229437, "AGATCGGAAGAG":225478, "AGATCGGAAGAGC":219571, "AGATCGGAAGAGCG":212924, "AGATCGGAAGAGCGT":205831, "AGATCGGAAGAGCGTC":203689, "AGATCGGAAGAGCGTCG":205204, "AGATCGGAAGAGCGTCGT":195968, "AGATCGGAAGAGCGTCGTG":187239, "AGATCGGAAGAGCGTCGTGT":178116, "AGATCGGAAGAGCGTCGTGTA":172266, "AGATCGGAAGAGCGTCGTGTAG":162462, "AGATCGGAAGAGCGTCGTGTAGG":153343, "AGATCGGAAGAGCGTCGTGTAGGG":145902, "AGATCGGAAGAGCGTCGTGTAGGGA":139406, "AGATCGGAAGAGCGTCGTGTAGGGAA":131260, "AGATCGGAAGAGCGTCGTGTAGGGAAA":126247, "AGATCGGAAGAGCGTCGTGTAGGGAAAG":120275, "AGATCGGAAGAGCGTCGTGTAGGGAAAGA":112856, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAG":105885, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGT":98076, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG":91036, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT":82723, "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTT":76530, "others":1381254}
Please let me know if I can help in any way! Best regards,
Christopher
Dear developers, When I check our RNAseq (we're expecting the: Illumina TruSeq Adapters for Read 1 and Read 2) data for adapters, fastp detects a mix of the correct Adapters but sometimes also the following adapters: Nextera_LMP_Read1_External_Adapter/Nextera_LMP_Read2_External_Adapter The difference between the Illumina Truseq and Nextera LMP Adapters is exactly 1 A at the beginning of the adapter sequence that is missing in the Nextera ones. The easiest solution is removing the LMP adapters from the list, since the protocol is not used anymore. But it may also be a bug...
Best regards, Christopher Previti