Closed CDieterich closed 4 years ago
I think this issue is related to the following observations.
JACUSA2 call-1 using identical parameters on the same BAM file, once with -P FR-SECONDSTRAND, the other with -P RF-FIRSTSTRAND, yields identical results.
## JACUSA2 Version: 2.0.0-RC13 call-1 -p 12 -P FR-SECONDSTRAND -m 1 -q 20 -c 1 -a D,Y -r FR_107339B_SLAM_1h.tab /prj/MCF7_RNA_turnover_Naarmann/CD30-analysis/mapping/107339B_SLAM_1h.sortdedup.bam
#contig start end name score strand bases11 info filter ref
1 10559 10560 call-1 1.345028283936324 + 0,0,1,0 * * C
1 10622 10623 call-1 2.714893275683302 + 0,2,0,0 * * T
1 10648 10649 call-1 1.345028283941954 + 1,0,0,0 * * G
1 14114 14115 call-1 0.680230635087355 - 1,0,0,4 * * T
1 14247 14248 call-1 2.8159531447923136 - 1,3,0,0 * * A
1 14353 14354 call-1 2.7148932756517734 - 0,0,0,2 * * G
## JACUSA2 Version: 2.0.0-RC13 call-1 -p 12 -P RF-FIRSTSTRAND -m 1 -q 20 -c 1 -a D,Y -r _107339B_SLAM_1h.tab /prj/MCF7_RNA_turnover_Naarmann/CD30-analysis/mapping/107339B_SLAM_1h.sortdedup.bam
#contig start end name score strand bases11 info filter ref
1 10559 10560 call-1 1.345028283936324 + 0,0,1,0 * * C
1 10622 10623 call-1 2.714893275683302 + 0,2,0,0 * * T
1 10648 10649 call-1 1.345028283941954 + 1,0,0,0 * * G
1 14114 14115 call-1 0.680230635087355 - 1,0,0,4 * * T
1 14247 14248 call-1 2.8159531447923136 - 1,3,0,0 * * A
1 14353 14354 call-1 2.7148932756517734 - 0,0,0,2 * * G
Same behaviour with JACUSA2 pileup
## JACUSA2 Version: 2.0.0-RC13 pileup -p 12 -m1 1 -q1 20 -c1 1 -P1 RF-FIRSTSTRAND -m2 1 -q2 20 -c2 1 -P2 FR-SECONDSTRAND -a D,Y -r pileup_107339B_SLAM_1h.tab /prj/MCF7_RNA_turnover_Naarmann/CD30-analysis/mapping/107339B_SLAM_1h.sortdedup.bam /prj/MCF7_RNA_turnover_Naarmann/CD30-analysis/mapping/107339B_SLAM_1h.sortdedup.bam
#contig start end name stat strand bases11 bases21 info filter ref
1 10540 10541 pileup 2.0 + 0,1,0,0 0,1,0,0 * * C
1 10541 10542 pileup 2.0 + 0,1,0,0 0,1,0,0 * * C
1 10542 10543 pileup 2.0 + 0,0,1,0 0,0,1,0 * * G
1 10543 10544 pileup 2.0 + 1,0,0,0 1,0,0,0 * * A
1 10544 10545 pileup 2.0 + 1,0,0,0 1,0,0,0 * * A
1 10545 10546 pileup 2.0 + 1,0,0,0 1,0,0,0 * * A
1 10546 10547 pileup 2.0 + 0,0,0,1 0,0,0,1 * * T
1 10547 10548 pileup 2.0 + 0,1,0,0 0,1,0,0 * * C
1 10548 10549 pileup 2.0 + 0,0,0,1 0,0,0,1 * * T
1 10549 10550 pileup 2.0 + 0,0,1,0 0,0,1,0 * * G
I add one selected examples illustrating what would be the expected behaviour.
My library is RF-FIRSTSTRAND, so I have /2---> <---1/, or mate 2 is from the original strand. For one particular entry, JACUSA2 yields
1 10622 10623 call-1 2.714893275683302 + 0,2,0,0 * * T
The matching BAM entries (discarding secondary alignments, I think this is what JACUSA2 does, right?) are:
[mate1] A00620:41:HKMKVDMXX:2:1333:23782:9330 99 1 10541 1 89M2D12M = 10570 130 CCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:19C62T6^GC12 PG:Z:MarkDuplicates NH:i:4 HI:i:1 jI:B:i,-1 NM:i:4 jM:B:c,-1 nM:i:4 AS:i:178
[mate2] A00620:41:HKMKVDMXX:2:1333:23782:9330 147 1 10570 1 60M2D39M = 10541 -130 CCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCCGCGCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF: MD:Z:53T6^GC17G21 PG:Z:MarkDuplicates NH:i:4 HI:i:1 jI:B:i,-1 NM:i:4 jM:B:c,-1 nM:i:4 AS:i:178
Here, mate1 is mapped in sense strand, and mate2 is mapped anti-sense, i.e. mate2 originates from the reverse strand. Both are properly mapped in pairs.
I find these mismatches: [mate1] pos=10622, ref=T, base=C (would be reported as A->G) [mate2] pos=10622, ref=A, base=G
This one fragment (2 mates) overlap the "T->C mismatch" reported by JACUSA2.
Incidentally:
I have added to the cloud
Dear Michael, RC14 seems to be working as expected, so unless @CDieterich sees any other related issue, I'll close this one. Thanks.
Hi Michael,
jacusa.jar Version: 1.3.5 call-2 -s -c 5 -P RF-FIRSTSTRAND,RF-FIRSTSTRAND -p 10 -W 1000000 -F 1024 --filterNM_1 5 --filterNM_2 5 -T 1 -a D,M,Y -u DirMult:showAlpha -R -r call2_RNA_KO K002000225_86972_STARmapping_uniq_rmdup.bam,K002000225_86975_STARmapping_uniq_rmdup.bam,K002000225_86976_STARmapping_uniq_rmdup.bam K002000225_86971_STARmapping_uniq_rmdup.bam,K002000225_86974_STARmapping_uniq_rmdup.bam,K002000225_86973_STARmapping_uniq_rmdup.bam
and
JACUSA2 Version: 2.0.0-RC13 call-2 -F 1024 -c 10 -p 10 -D -I -a D,Y -P1 RF-FIRSTSTRAND -P2 RF-FIRSTSTRAND -r call2_RNA_KO_RC9_call2_result.out K002000225_86972_STARmapping_uniq_rmdup.bam,K002000225_86975_STARmapping_uniq_rmdup.bam,K002000225_86976_STARmapping_uniq_rmdup.bam K002000225_86971_STARmapping_uniq_rmdup.bam,K002000225_86974_STARmapping_uniq_rmdup.bam,K002000225_86973_STARmapping_uniq_rmdup.bam
produce different results on the classical ADAR1 E861A allele.
JACUSA2 says: 3 89746630 89746631 call-2 5.625566079828488 - 0,0,11,16 0,0,11,3 0,0,8,16 0,0,25,0 0,0,24,0 0,0,9,5 T 3 89746631 89746632 call-2 5.6063230362406955 - 11,0,0,15 11,0,0,3 8,0,0,16 25,0,0,0 24,0,0,0 9,0,0,5 T
while JACUSA1.3.5 says: 3 89746630 89746631 variant 5.625566079896998 + 16,11,0,0 3,11,0,0 16,8,0,0 0,25,0,0 0,24,0,0 5,9,0,0 alpha1=11188.384184816841:9601.85412112383:69.8105553602469:69.8105553602469;alpha2=0.5647767609649967:7.290456409137258:0.20697830129410996:0.20697830129410996;alphaP=0.9016251286991446:2.6414091415873204:0.16117799024330678:0.16117799024330678;initAlpha1=0.5346369330406189:0.45873559060709357:0.003313738176143709:0.003313738176143709;initAlpha2=0.08164001235762022:0.9116972755004218:0.00333135607097912:0.00333135607097912;initAlphaP=0.31167095769867736:0.6816842228240553:0.00332240973863364:0.00332240973863364;iterations1=23;iterations2=10;iterationsP=10;logLikelihood1=-47.93415535953375;logLikelihood2=-19.00650741266077;logLikelihoodP=-72.56622885209151 A 3 89746631 89746632 variant 5.606323036284408 + 15,0,0,11 3,0,0,11 16,0,0,8 0,0,0,25 0,0,0,24 5,0,0,9 alpha1=11070.40293653336:69.91107380907921:69.91107380907921:9777.979127320734;alpha2=0.5627590466451253:0.20628169884562636:0.20628169884562636:7.266380686311476;alphaP=0.9128990637495525:0.16174569173203854:0.16174569173203854:2.705863471753122;initAlpha1=0.5275266396288331:0.003309608329803837:0.003309608329803837:0.4658541437115592;initAlpha2=0.08162740582152242:0.0033148272976983003:0.0033148272976983003:0.9117429395830811;initAlphaP=0.3063304358925807:0.00331219729116136:0.00331219729116136:0.6870451695250964;iterations1=23;iterations2=10;iterationsP=10;logLikelihood1=-47.25731272570556;logLikelihood2=-18.986209684475106;logLikelihoodP=-71.84984544646508 A
I think JACUSA 1.3.5 is right !