XiaoTaoWang / HiC_pipeline

An easy-to-use Hi-C data processing software supporting distributed computation.
http://xiaotaowang.github.io/HiC_pipeline/index.html
GNU General Public License v3.0
55 stars 20 forks source link

differences between chunk and not? #6

Closed shuzhangcourage closed 3 years ago

shuzhangcourage commented 3 years ago

Hi Xiaotao, I have a question I want to ask, but actually I don't know if this is only my problem. I tried same data, ran with or without --chunk, respectively. And I found the results have a few differences. It looks "chunk" stuff filtered more. I am not sure if because multiple sbatch(I mean I submit many jobs). Do you have any clues?? Thanks!!

Shu

Here I show you, didn't chunk resulst: 000_SequencedReads: 461569471 010_DoubleSideMappedReads: 376308035 020_SingleSideMappedReads: 72212411 030_UnmappedReads: 13049025 100_NormalPairs: 376308035 110_AfterFilteringReads: ### 305410745 120_SameFragmentReads: 15620360 122_SelfLigationReads: 182740 124_DanglingReads: 15354550 126_UnknownMechanism: 83070 130_DuplicateRemoved: 55276930 400_TotalContacts: 305410745 410_IntraChromosomalReads: 219433823 412_IntraLongRangeReads(>=20Kb): 141781999 412_IntraShortRangeReads(<20Kb): 77651824 420_InterChromosomalReads: 85976922

Critical Indicators: Double Unique Mapped Ratio = 376308035 / 461569471 = 0.8153 Self-Ligation Ratio = 182740 / 461569471 = 0.0004 Dangling-Reads Ratio = 15354550 / 461569471 = 0.0333 Long-Range Ratio = 141781999 / 305410745 = 0.4642 Data Usage = 305410745 / 461569471 = 0.6617

chunk results 000_SequencedReads: 461569471 010_DoubleSideMappedReads: 376308258 020_SingleSideMappedReads: 72212190 030_UnmappedReads: 13049023 100_NormalPairs: 376308258 110_AfterFilteringReads: 284577893 120_SameFragmentReads: 15620360 122_SelfLigationReads: 182740 124_DanglingReads: 15354550 126_UnknownMechanism: 83070 130_DuplicateRemoved: 76110005 400_TotalContacts: 284577893 410_IntraChromosomalReads: 204485058 412_IntraLongRangeReads(>=20Kb): 132094437 412_IntraShortRangeReads(<20Kb): 72390621 420_InterChromosomalReads: 80092835

Critical Indicators: Double Unique Mapped Ratio = 376308258 / 461569471 = 0.8153 Self-Ligation Ratio = 182740 / 461569471 = 0.0004 Dangling-Reads Ratio = 15354550 / 461569471 = 0.0333 Long-Range Ratio = 132094437 / 284577893 = 0.4642 Data Usage = 284577893 / 461569471 = 0.6165

XiaoTaoWang commented 3 years ago

Hi Shu,

I couldn't reproduce this inconsistency. Did you use the same runHiC version in these two runs? As we mentioned in issue #5, after 0.8.4, runHiC applies a different PCR removing strategy from previous versions.

Xiaotao

shuzhangcourage commented 3 years ago

Hi Xiaotao, thank you for your rapid reply!! It wan't the problem of different version. I ran again my data and everything went well. I don't know what was wrong at that time, but it was figured out!! Thanks again!! Shu