Closed XLZH closed 4 years ago
Hello @smzt,
pTrimmer supports both "long amplicon size (normal condition)" and "short amplicon size (read-through condition)". The amplicon you provided, in fact, is the "read-through condition". But the opposite primer at the 3'-end is too short (even shorter than one kmer length) to be located! To obtain accurate results, we prefer to discard such reads.
The match condition of your read is as follows: fastq1: opposite primer at 3'-end only has 3 bases (ACA) fastq2: opposite primer at 3'-end only has 7 bases (CTCCTGG)
----- fastq 1 -----
@M03970:332:000000000-J2CK5:1:1101:16151:2887 1:N:0:15
GTCCAGCTTTGTGCCAGGAG CCTCGCAGGGGTTGATGGGATTGGGGTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGG ACA
GTCCAGCTTTGTGCCAGGAG ACACTTTGCGTTCGGGCT
----- fastq 2 -----
@M03970:332:000000000-J2CK5:1:1101:21721:3033 2:N:0:15
AGCCCGAACGCAAAGTGT CCCCGGAGCCCAGCAGCTACCTGCTCCCTGGACGGTGGCTCTAGACTTTTGAGAAGCTCAAAACTTTTAGCGCCAGTCTTGAGCACATGGGAGGGGAAAACCCCAATCCCATCAACCCCTGCGAGG CTCCTGG
AGCCCGAACGCAAAGTGT CTCCTGGCACAAAGCTGGAC
Hi Xiaolong, That's the reason why I contacted you. This issue is very common when analyzing amplicons with NGS, in some of them you do not have the full sequence of the primer at the 3'-end. It a pity your tool does not include an option to either remove this small parts of the primer at the 3'-end or at least give the option to retain these reads.
Thank you very much for your quick reply.
Regards,
Sheila
On Wed, Jun 3, 2020 at 3:26 AM Xiaolong Zhang notifications@github.com wrote:
Hello @smzt https://github.com/smzt,
pTrimmer supports both "long amplicon size (normal condition)" and "short amplicon size (read-through condition)". The amplicon you provided, in fact, is the "read-through condition". But the opposite primer at the 3'-end is too short (even shorter than one kmer length) to be located! To obtain accurate results, we prefer to discard such reads.
The match condition of your read is as follows: fastq1: opposite primer at 3'-end only has 3 bases (ACA) fastq2: opposite primer at 3'-end only has 7 bases (CTCCTGG)
----- fastq 1 ----- @M03970:332:000000000-J2CK5:1:1101:16151:2887 1:N:0:15 GTCCAGCTTTGTGCCAGGAG CCTCGCAGGGGTTGATGGGATTGGGGTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGG ACA GTCCAGCTTTGTGCCAGGAG ACACTTTGCGTTCGGGCT
----- fastq 2 ----- @M03970:332:000000000-J2CK5:1:1101:21721:3033 2:N:0:15 AGCCCGAACGCAAAGTGT CCCCGGAGCCCAGCAGCTACCTGCTCCCTGGACGGTGGCTCTAGACTTTTGAGAAGCTCAAAACTTTTAGCGCCAGTCTTGAGCACATGGGAGGGGAAAACCCCAATCCCATCAACCCCTGCGAGG CTCCTGG AGCCCGAACGCAAAGTGT CTCCTGGCACAAAGCTGGAC
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DMU-lilab/pTrimmer/issues/4#issuecomment-637899313, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXNCKN3FPO2AK3H4UFSONTRUWRCXANCNFSM4NRHCACA .
Hi Sheila,
pTrimmer is able to process most of the 'part read-through' primer sequence at the 3'-end. But the part of 3'-end primer sequence must be longer than one k-mer (default: 8), which is a relative insurance strategy to prevent wrong match.
For the amplicon you provided: (1) You can set the parameter '-k|--kmer' to 7 to process most of the reads shown bellow (7 bases read-through).
----- fastq 2 -----
@M03970:332:000000000-J2CK5:1:1101:21721:3033 2:N:0:15
AGCCCGAACGCAAAGTGT CCCCGGAGCCCAGCAGCTACCTGCTCCCTGGACGGTGGCTCTAGACTTTTGAGAAGCTCAAAACTTTTAGCGCCAGTCTTGAGCACATGGGAGGGGAAAACCCCAATCCCATCAACCCCTGCGAGG CTCCTGG
AGCCCGAACGCAAAGTGT CTCCTGGCACAAAGCTGGAC
(2) There also has a parameter '-l|--keep' to retain those reads that failed to locate primer sequence
Best, Xiaolong Zhang
Dear developers, Is it possible that the software discards paired reads where both primers are not found? If the amplicon is large, then some of the reads will not have the opposite primer at the 3'-end or if they have it, they might have just a small part. One of my amplicos is being completely missed and NNNNN are reported in the final files. Here are 3 reads for that amplicon (paired-end).
r1.fastq
@M03970:332:000000000-J2CK5:1:1101:16151:2887 1:N:0:15 GTCCAGCTTTGTGCCAGGAGCCTCGCAGGGGTTGATGGGATTGGGGTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACA +
r2.fastq
@M03970:332:000000000-J2CK5:1:1101:16151:2887 2:N:0:15 AGCCCGAACGCAAAGTGTCCCCGGAGCCCAGCAGCTACCTGCTCCCTGGACGGTGGCTCTAGCCTTTTGAGAAGCTCAAAACTTTTAGCGCCAGTCTTGAGCACATGGGAGGGGAACACCCCAATCCCATCAACCCCTGCGAGGCTACTGG + ?AA@ADDDDDADG?BGG3FGGGGE0/AFE/EFB0GF1FFDDHBGHGFCG0BE/>EEHCGF1B1B@11B1//>B0F1G1@11BFFGD2>GEGGCEBFHEFFBFHFH1F0/0?CCC///<@EGG?/FFHHFHHF1FGGGE..<<<-CC0<:CG @M03970:332:000000000-J2CK5:1:1101:21721:3033 2:N:0:15 AGCCCGAACGCAAAGTGTCCCCGGAGCCCAGCAGCTACCTGCTCCCTGGACGGTGGCTCTAGACTTTTGAGAAGCTCAAAACTTTTAGCGCCAGTCTTGAGCACATGGGAGGGGAAAACCCCAATCCCATCAACCCCTGCGAGGCTCCTGG + ABBBBB?ADDBB?CGGGGGGGGGGGGGGGFFHHHHHGHHHGHHHHGHHHBHGGEHGHHHHBG4FFHHHHAF?4BFHHB@@3FGHHH4EGGGGGGGHHHHDHHHHFGFHGGFGGGGG/FHGGGA/GHHHFHH11GGGGGGHGCDCG.CGHGH @M03970:332:000000000-J2CK5:1:1101:14369:3115 2:N:0:15 AGCCCGAACGCAAAGTGTCCCCGGAGCCCAGCAGCTACCTGCTCCCTGGACGGTGGCTCTAGGCTTTTGAGAAGCTCAAAACTTTTAGCGCCAGTCTTGAGCACATGGGAGGGGAAAACCCCAATCCCATCAACCCCTGCGAGGCTCCTGG + AB@AABBBBBBBGGFGFGGGGGGGCGGGGHHHHGHHHHHHHHHHHGHHHHHGGEGDHHHHFHHHHHHFHFGHHHHHHHFFFHHHHHHHHGGGGGHHHHHHHHHHHHHHGHGGGGGGHHHGGGGGHHHHHHHHHHHGGGGHGGGGGFHGHHH
Primers used to amplify the region:
amplicon_primers.txt
AGCCCGAACGCAAAGTGT GTCCAGCTTTGTGCCAGGAG 125
Thanks very much in advance.
Regards,
Sheila
Originally posted by @smzt in https://github.com/DMU-lilab/pTrimmer/issues/3#issuecomment-637606797