Closed yangmqglobe closed 3 years ago
Okay. So to confirm I follow, here's what you did:
Version1 (same as PEPATAC): For SRR7866359_filter_dedup.bam
you 1) aligned, 2) removed low quality reads, 3) removed duplicates
Version2: For SRR7866359_dedup_filter.bam
you 1) aligned, 2) removed duplicates, 3) removed low quality reads
--
I grabbed SRR7866359 and performed the same. I then intersected the two bam files with bedtools intersect
.
Just like in your example, filtering first leaves more reads than the reverse. Based on the intersect, there are no reads in the "Version2" approach that are unique. Meaning, while the Version1 approach retains extra reads, doing it with Version2 doesn't retain a different set of unique reads, just fewer total reads. So that's interesting.
Then, I looked at where those reads unique to the Version1 approach mapped (checking the result of the bedtools intersect
call). Here's that result:
samtools idxstats SRR7866359_reads_unique_to_version1_approach.bam
chr1 248956422 2187 0
chr2 242193529 2358 0
chr3 198295559 1438 0
chr4 190214555 1625 0
chr5 181538259 1530 0
chr6 170805979 1259 0
chr7 159345973 1785 0
chr8 145138636 1109 0
chr9 138394717 1352 0
chr10 133797422 1369 0
chr11 135086622 964 0
chr12 133275309 1269 0
chr13 114364328 670 0
chr14 107043718 614 0
chr15 101991189 869 0
chr16 90338345 947 0
chr17 83257441 906 0
chr18 80373285 627 0
chr19 58617616 590 0
chr20 64444167 620 0
chr21 46709983 402 0
chr22 50818468 547 0
chrX 156040895 1332 0
chrY 57227415 76 0
chrM 16569 0 0
chr1_KI270706v1_random 175055 17 0
chr1_KI270707v1_random 32032 2 0
chr1_KI270708v1_random 127682 0 0
chr1_KI270709v1_random 66860 23 0
chr1_KI270710v1_random 40176 0 0
chr1_KI270711v1_random 42210 2 0
chr1_KI270712v1_random 176043 2 0
chr1_KI270713v1_random 40745 0 0
chr1_KI270714v1_random 41717 6 0
chr2_KI270715v1_random 161471 2 0
chr2_KI270716v1_random 153799 4 0
chr3_GL000221v1_random 155397 2 0
chr4_GL000008v2_random 209709 22 0
chr5_GL000208v1_random 92689 5 0
chr9_KI270717v1_random 40062 4 0
chr9_KI270718v1_random 38054 1 0
chr9_KI270719v1_random 176845 4 0
chr9_KI270720v1_random 39050 4 0
chr11_KI270721v1_random 100316 0 0
chr14_GL000009v2_random 201709 27 0
chr14_GL000225v1_random 211173 56 0
chr14_KI270722v1_random 194050 0 0
chr14_GL000194v1_random 191469 25 0
chr14_KI270723v1_random 38115 2 0
chr14_KI270724v1_random 39555 0 0
chr14_KI270725v1_random 172810 14 0
chr14_KI270726v1_random 43739 0 0
chr15_KI270727v1_random 448248 3 0
chr16_KI270728v1_random 1872759 28 0
chr17_GL000205v2_random 185591 8 0
chr17_KI270729v1_random 280839 19 0
chr17_KI270730v1_random 112551 1 0
chr22_KI270731v1_random 150754 3 0
chr22_KI270732v1_random 41543 6 0
chr22_KI270733v1_random 179772 13 0
chr22_KI270734v1_random 165050 2 0
chr22_KI270735v1_random 42811 6 0
chr22_KI270736v1_random 181920 6 0
chr22_KI270737v1_random 103838 4 0
chr22_KI270738v1_random 99375 0 0
chr22_KI270739v1_random 73985 0 0
chrY_KI270740v1_random 37240 0 0
chrUn_KI270302v1 2274 0 0
chrUn_KI270304v1 2165 0 0
chrUn_KI270303v1 1942 2 0
chrUn_KI270305v1 1472 0 0
chrUn_KI270322v1 21476 0 0
chrUn_KI270320v1 4416 0 0
chrUn_KI270310v1 1201 0 0
chrUn_KI270316v1 1444 0 0
chrUn_KI270315v1 2276 0 0
chrUn_KI270312v1 998 0 0
chrUn_KI270311v1 12399 0 0
chrUn_KI270317v1 37690 0 0
chrUn_KI270412v1 1179 0 0
chrUn_KI270411v1 2646 0 0
chrUn_KI270414v1 2489 0 0
chrUn_KI270419v1 1029 0 0
chrUn_KI270418v1 2145 0 0
chrUn_KI270420v1 2321 0 0
chrUn_KI270424v1 2140 0 0
chrUn_KI270417v1 2043 0 0
chrUn_KI270422v1 1445 0 0
chrUn_KI270423v1 981 0 0
chrUn_KI270425v1 1884 0 0
chrUn_KI270429v1 1361 0 0
chrUn_KI270442v1 392061 27 0
chrUn_KI270466v1 1233 0 0
chrUn_KI270465v1 1774 0 0
chrUn_KI270467v1 3920 1 0
chrUn_KI270435v1 92983 5 0
chrUn_KI270438v1 112505 21 0
chrUn_KI270468v1 4055 0 0
chrUn_KI270510v1 2415 0 0
chrUn_KI270509v1 2318 0 0
chrUn_KI270518v1 2186 0 0
chrUn_KI270508v1 1951 0 0
chrUn_KI270516v1 1300 0 0
chrUn_KI270512v1 22689 0 0
chrUn_KI270519v1 138126 13 0
chrUn_KI270522v1 5674 0 0
chrUn_KI270511v1 8127 0 0
chrUn_KI270515v1 6361 3 0
chrUn_KI270507v1 5353 2 0
chrUn_KI270517v1 3253 0 0
chrUn_KI270529v1 1899 0 0
chrUn_KI270528v1 2983 0 0
chrUn_KI270530v1 2168 0 0
chrUn_KI270539v1 993 0 0
chrUn_KI270538v1 91309 7 0
chrUn_KI270544v1 1202 0 0
chrUn_KI270548v1 1599 0 0
chrUn_KI270583v1 1400 0 0
chrUn_KI270587v1 2969 0 0
chrUn_KI270580v1 1553 1 0
chrUn_KI270581v1 7046 0 0
chrUn_KI270579v1 31033 2 0
chrUn_KI270589v1 44474 0 0
chrUn_KI270590v1 4685 0 0
chrUn_KI270584v1 4513 1 0
chrUn_KI270582v1 6504 2 0
chrUn_KI270588v1 6158 0 0
chrUn_KI270593v1 3041 2 0
chrUn_KI270591v1 5796 0 0
chrUn_KI270330v1 1652 0 0
chrUn_KI270329v1 1040 2 0
chrUn_KI270334v1 1368 0 0
chrUn_KI270333v1 2699 0 0
chrUn_KI270335v1 1048 0 0
chrUn_KI270338v1 1428 0 0
chrUn_KI270340v1 1428 0 0
chrUn_KI270336v1 1026 0 0
chrUn_KI270337v1 1121 2 0
chrUn_KI270363v1 1803 0 0
chrUn_KI270364v1 2855 0 0
chrUn_KI270362v1 3530 0 0
chrUn_KI270366v1 8320 0 0
chrUn_KI270378v1 1048 0 0
chrUn_KI270379v1 1045 0 0
chrUn_KI270389v1 1298 0 0
chrUn_KI270390v1 2387 0 0
chrUn_KI270387v1 1537 0 0
chrUn_KI270395v1 1143 0 0
chrUn_KI270396v1 1880 0 0
chrUn_KI270388v1 1216 0 0
chrUn_KI270394v1 970 0 0
chrUn_KI270386v1 1788 0 0
chrUn_KI270391v1 1484 0 0
chrUn_KI270383v1 1750 0 0
chrUn_KI270393v1 1308 2 0
chrUn_KI270384v1 1658 0 0
chrUn_KI270392v1 971 0 0
chrUn_KI270381v1 1930 0 0
chrUn_KI270385v1 990 0 0
chrUn_KI270382v1 4215 0 0
chrUn_KI270376v1 1136 0 0
chrUn_KI270374v1 2656 0 0
chrUn_KI270372v1 1650 0 0
chrUn_KI270373v1 1451 0 0
chrUn_KI270375v1 2378 0 0
chrUn_KI270371v1 2805 0 0
chrUn_KI270448v1 7992 1 0
chrUn_KI270521v1 7642 2 0
chrUn_GL000195v1 182896 15 0
chrUn_GL000219v1 179198 13 0
chrUn_GL000220v1 161802 17 0
chrUn_GL000224v1 179693 21 0
chrUn_KI270741v1 157432 0 0
chrUn_GL000226v1 15008 0 0
chrUn_GL000213v1 164239 2 0
chrUn_KI270743v1 210658 16 0
chrUn_KI270744v1 168472 22 0
chrUn_KI270745v1 41891 2 0
chrUn_KI270746v1 66486 4 0
chrUn_KI270747v1 198735 7 0
chrUn_KI270748v1 93321 0 0
chrUn_KI270749v1 158759 8 0
chrUn_KI270750v1 148850 8 0
chrUn_KI270751v1 150742 11 0
chrUn_KI270752v1 27745 2 0
chrUn_KI270753v1 62944 0 0
chrUn_KI270754v1 40191 2 0
chrUn_KI270755v1 36723 0 0
chrUn_KI270756v1 79590 4 0
chrUn_KI270757v1 71251 13 0
chrUn_GL000214v1 137718 7 0
chrUn_KI270742v1 186739 10 0
chrUn_GL000216v2 176608 31 0
chrUn_GL000218v1 161147 14 0
chrEBV 171823 0 0
* 0 0 0
Okay, so we learn there that they do map across the genome.
If we investigate the samtools flags, there are only 4 flags for these reads: 147/99, 163/83. Those all share the fact they are reads mapping to the reverse strand. Not sure what it is yet about reads on the reverse strand that are retained when you perform QC filtering before deduplication...Will keep investigating.
So I've check this by switch these two step, the result is
Obviously the result is different. But why? Is this because that a low mapping quality reads paired with a high mapping quality reads? Should we treat these reads as unpaird and filter out before call peaks?