Open xyw1 opened 5 years ago
How many bases are being trimmed?
On Dec 4, 2018, at 1:15 AM, xuyw notifications@github.com wrote:
Hello, I tried you tool for both UMI-tagged bam file, in which UMI sequences are soft-clipped, and UMI-trimmed bam file, and I found that after trimming UMI sequences, both supporting reads number (SU tag) and run time increase dramatically, so I wonder if there is any procedure that ignores reads according to their soft-clip length?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
21bp
I am surprised that after trimming the run time increases. Can you explain to me how you trim?
On Tue, Dec 4, 2018 at 7:25 AM xuyw notifications@github.com wrote:
21bp
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/280#issuecomment-444117461, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUc8x08oSRn-1NMs0DCCLN_30jWebks5u1oXbgaJpZM4ZAIOT .
I guess the run time increase is related to splt&disc reads number increase. Actually the problem I care more about is why lumpy has omitted so many reads when they're tagged with UMI.
SampleID | reads_n | discorant_reads_n | split_mapping_reads_n | run_time_second |
---|---|---|---|---|
sample_1_full_length | 2789185 | 349618 | 1080174 | 29.16 |
sample_1_trim_UMI_21 | 2005583 | 585602 | 1713312 | 172558.96 |
sample_1_trim_UMI_39 | 1987282 | 599890 | 1676850 | 159510.22 |
sample_2_full_length | 2298425 | 371700 | 881921 | 37.71 |
sample_2_trim_UMI_21 | 97560 | 18449 | 47248 | 194.84 |
sample_2_trim_UMI_39 | 1734054 | 538511 | 1427644 | 123412.39 |
sample_3_full_length | 146236 | 16200 | 44837 | 2.56 |
sample_3_trim_UMI_21 | 142262 | 31385 | 88046 | 372.59 |
sample_3_trim_UMI_39 | 141628 | 34751 | 86782 | 383.34 |
sample_4_full_length | 7267 | 46 | 15 | 0.76 |
sample_4_trim_UMI_21 | 7220 | 53 | 28 | 0.68 |
sample_4_trim_UMI_39 | 7219 | 53 | 26 | 0.77 |
samtools view $bam | wc -l
lumpy_filter
This is a UMI-tagged bam file, those soft-clipped bases are UMI
FS10000223:4:BNT40301-1434:1:1105:6740:3440 99 chr1 26767 0 21S130M = 26786 150 GGTACCCACATAAGGCGAACTCTCTTAGCAGAATGTGTGCCTCTCGGCCGGGCGCAGCGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCGAAGGCAGGCAGATCACCTGAGGTCGGGAGTTTGAGACCAGTCTGACCAACATGGTGAA FFFFFFFFFFF,FFFFFFFF,:::,::F:FFFFFF:::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:FFF NM:i:1 MD:Z:103C26 MC:Z:131M20S AS:i:125 XS:i:125 RG:Z:CC_iSeq_Nov15_701
FS10000223:4:BNT40301-1434:1:1105:6740:3440 147 chr1 26786 0 131M20S = 26767 -150 CTCTCGGCCGGGCGCAGCGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCGAAGGCAGGCAGATCACCTGAGGTCGGGAGTTTGAGACCAGTCTGACCAACATGGTGAAACTCCATCTCTACTAAAAATGTTCGCCTTAATAGGTGGAG :FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,F:,,FF::,FFFFFFF,FFFFFFFFF:F NM:i:1 MD:Z:84C46 MC:Z:21S130M AS:i:126 XS:i:126 RG:Z:CC_iSeq_Nov15_701
In order to trim UMI sequences, I
FS10000223:4:BNT40301-1434:1:1116:13990:1480 65 chr1 43426 0 130M chr2 29223469 0 TCATCTCAATAGATGCAGAAAAAGCATTAACAAAAGTAAACATTCTTTCATAATAAGACATCAGATAAAACAAATTAGGAATAGAAGGAATGTACCGCAACACAATAAAGGCCATATATAACAAGCCCAC FFFFFF,FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:130 MC:Z:18S62M50S AS:i:130 XS:i:130 RG:Z:CC_iSeq_Nov15_701.UMI.trim_UMI XA:Z:chr19,+85037,130M,0;chr15,-101947408,130M,1;
FS10000223:4:BNT40301-1434:1:1105:6740:3440 99 chr1 197289 0 130M = 197308 149 CTCTTAGCAGAATGTGTGCCTCTCGGCCGGGCGCAGCGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCGAAGGCAGGCAGATCACCTGAGGTCGGGAGTTTGAGACCAGTCTGACCAACATGGTGAA :::,::F:FFFFFF:::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:FFF NM:i:1 MD:Z:103C26 MC:Z:130M AS:i:125 XS:i:125 RG:Z:CC_iSeq_Nov15_701.UMI.trim_UMI
Using these two sort of bam files, I ran lumpy in this way
lumpyexpress -B $bam -o $vcf
take one gene fusion for example, the results are:
chr2 29223529 14_1 N N]chr2:42301391] . . SVTYPE=BND;STRANDS=++:5;EVENT=14;MATEID=14_2;CIPOS=-9,75;CIEND=-7,84;CIPOS95=-1,24;CIEND95=0,24;IMPRECISE;SU=5;PE=5;SR=0 GT:SU:PE:SR ./.:5:5:0
chr2 42301391 14_2 N N]chr2:29223529] . . SVTYPE=BND;STRANDS=++:5;SECONDARY;EVENT=14;MATEID=14_1;CIPOS=-7,84;CIEND=-9,75;CIPOS95=0,24;CIEND95=-1,24;IMPRECISE;SU=5;PE=5;SR=0 GT:SU:PE:SR ./.:5:5:0
supporting reads number is 5
chr2 29223530 13_1 N N]chr2:42301392] . . SVTYPE=BND;STRANDS=++:500166;EVENT=13;MATEID=13_2;CIPOS=0,0;CIEND=0,0;CIPOS95=0,0;CIEND95=0,0;SU=500166;PE=136681;SR=363485 GT:SU:PE:SR ./.:500166:136681:363485
chr2 42301392 13_2 N N]chr2:29223530] . . SVTYPE=BND;STRANDS=++:500166;SECONDARY;EVENT=13;MATEID=13_1;CIPOS=0,0;CIEND=0,0;CIPOS95=0,0;CIEND95=0,0;SU=500166;PE=136681;SR=363485 GT:SU:PE:SR ./.:500166:136681:363485
supporting reads number is 500166
I can't upload bam file here for the file size limit, if you need them, you may leave your email.
Hello, I tried you tool for both UMI-tagged bam file, in which UMI sequences are soft-clipped, and UMI-trimmed bam file, and I found that after trimming UMI sequences, both supporting reads number (SU tag) and run time increase dramatically, so I wonder if there is any procedure that ignores reads according to their soft-clip length?