Open GD8888 opened 2 years ago
Hi @GD8888
first, I would check whether these are annotated or novel junctions (e.g. by adding jM to the --outSAMattrbiutes list). In the former case, the alignments are more likely to be correct. You can control the maximum gap of the novel splice junctions with --alignIntronMax
.
Hi @alexdobin,
Thanks for the tips to far! I have looked into whether these are annotated or novel junctions by including the jM tag. Most are unannotated. I extracted reads with insert sizes over 100kbp and looked at the jM tags.
Here is the frequency of each tag (this is just for one mate);
111 jM:B:c,0,2 112 jM:B:c,1,0 116 jM:B:c,0,21,21 122 jM:B:c,22,0 133 jM:B:c,2,0 135 jM:B:c,0,0 190 jM:B:c,21,21 233 jM:B:c,21,1 266 jM:B:c,1,1,1 269 jM:B:c,2,2,2 276 jM:B:c,3 281 jM:B:c,0,1 344 jM:B:c,0,22 481 jM:B:c,4 565 jM:B:c,0,21 772 jM:B:c,22,22 840 jM:B:c,2,22 931 jM:B:c,22,2 984 jM:B:c,1,21 1521 jM:B:c,21 4446 jM:B:c,22 7266 jM:B:c,1,1 7485 jM:B:c,2,2 7921 jM:B:c,0 23908 jM:B:c,1 28312 jM:B:c,2 46962 jM:B:c,-1
I would like to go on to do 2-pass mapping with this data so would these reads need to be excluded? Would I need to use the --alignIntronMax option on the 1st pass mapping or should I filter the junction files after the 1st pass (or both)?
Many thanks
Hi @GD8888
for 2-pass mapping these junctions have to be filtered out, the easiest way is to use --alignIntronMax
in both passes.
Hi,
I have a question concerning a proportion of my reads which are being mapped with giant insert sizes to 200,000bp+. I doubt these are real/true alignments. Having inspected some of these reads - in some cases, reads are clipped or split into small bits and mapped 250,000bp away into other gene bodies or outside gene bodies. I have blasted some of these offending reads and they both blast to the same gene.
I don’t know if this is an issue with overlapping reads as I do have some overlapping reads in my data. I have tried running default settings, trimming adaptors before mapping and trying options such as --peOverlapNbasesMin 5 ( to permit overlapping reads) and allowing for protruding ends with --alignEndsProtrude 5 ConcordantPair
Can I get some advice concerning what is going on and which settings to use for my data?
Here is an example of the reads in question below:
A00742:44:HMG2MDSXX:4:1331:31937:3662 99 chr_1 317246 255 24M1I110M256349N6M = 317246 256489 GAACTTCCCAAGGACATGGGAACGGCCAGGACACGGGGCCACAGGGCTGGTGGAGGGGCAGGACCCTCCGGGACCAGGCACAGGGACCCCAGGGGGGAACACACCCCCCGCTCTCCTCTCCTTACTTCCCCCGACCCCTTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:252 nM:i:8 A00742:44:HMG2MDSXX:4:1331:31937:3662 147 chr_1 317246 255 24M1I110M256349N6M = 317246 -256489 GAACTTCCCAAGGACATGGGAACGGCCAGGACACGGGGCCACAGGGCTGGTGGAGGGGCAGGACCCTCCGGGACCAGGCACAGGGACCCCAGGGGGGAACACACCCCCCGCTCTCCTCTCCTTACTTCCCCCGACCCCTTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:252 nM:i:8
A00742:44:HMG2MDSXX:3:1452:22869:31172 99 chr_2 48903495 255 72M12602N78M = 48916261 153860 GGCTGGTGGGAGATGGGCTTGGAGACTTCCCAGAGCGGGGAGTAGAAAGTTGGCTACCAGGGGTTCCATTCACTGAGCTAGTAGAAGCAGGAGACTTGCTGCGTCGGGATGGGCCTGGGCTCTTGGTGGTGCTCCTACGTGAACTCGAAG FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF NH:i:1 HI:i:1 AS:i:294nM:i:2 A00742:44:HMG2MDSXX:3:1452:22869:31172 147 chr_2 48916261 255 25M39503N100M101441N25M = 48903495 -153860 GAAGTAGATTTCACCACCCGACATTCACTTTCATCCAGCAAGAAGTCATCCTGGTAACGGAACTTTTCCGGTCCACATGCAATAAAAATGTCATCATCGCCAAAAAAATCCTGAAGGCACATCACCTGTTTTCCATCTAGTGTGTACAAG ,FF,F:FFFFFFF:FF,F,FFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFF,FFFFFFF:FFFFFF:FFFFFFFFFFF:FFFFFFF:FFFF:FFFFFFFFFF::FFFFFFFFF:F:FFFFFFFFFFFF: NH:i:1 HI:i:1 AS:i:294 nM:i:2
Thanks