ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
55 stars 10 forks source link

pcOri values not matched between different conditions #54

Closed KunFang93 closed 1 year ago

KunFang93 commented 1 year ago

Hi,

Sorry for bothering again. When I tried to count the number of compartment A/B from pcOri files, I noticed that pcOri value is not matched between the different combination of groups I tried. I wondered how could I solve this problem? Here is the unmatched results:

(dchic) [kun@G1400PNG-AP02LP dcHic_results]$ head -2 DifferentialResult/NT_PT_RT_100000/pcOri/intra_sample_chr6_combined.pcOri.bedGraph
chr start   end NT1_100000  NT2_100000  PT1_100000  PT2_100000  PT3_100000  PT4_100000  PT5_100000  RT1_100000  RT2_100000  RT3_100000  RT4_100000  RT5_100000
chr6    100000  200000  29.58381    32.6497 -7.01236    36.4471 36.70081    -4.29244    38.89834    **2.74258   -5.85739    8.86721 -1.87691    4.20834**
chr6    200000  300000  40.44635    40.62106    -7.06231    48.59131    42.51795    -12.05881   46.05725    -2.91815    -4.64475    6.15132 -1.01923    2.7069
(dchic) [kun@G1400PNG-AP02LP dcHic_results]$ head -3 DifferentialResult/NT_TT_100000/pcOri/intra_sample_chr6_combined.pcOri.bedGraph
chr start   end NT1_100000  NT2_100000  PT1_100000  PT2_100000  PT3_100000  PT4_100000  PT5_100000  RT1_100000  RT2_100000  RT3_100000  RT4_100000  RT5_100000
chr6    100000  200000  29.58381    32.6497 -7.01236    36.4471 36.70081    -4.29244    38.89834    **37.81121  34.08116    40.03612    31.64915    45.43659**
chr6    200000  300000  40.44635    40.62106    -7.06231    48.59131    42.51795    -12.05881   46.05725    43.78865    42.85465    50.77905    44.46731    54.24701
(dchic) [kun@G1400PNG-AP02LP dcHic_results]$ head -3 DifferentialResult/PT_RT_100000/pcOri/intra_sample_chr6_combined.pcOri.bedGraph
chr start   end PT1_100000  PT2_100000  PT3_100000  PT4_100000  PT5_100000  RT1_100000  RT2_100000  RT3_100000  RT4_100000  RT5_100000
chr6    100000  200000  -7.01236    36.4471 36.70081    -4.29244    38.89834    **2.74258   -5.85739    8.86721 -1.87691    4.20834**
chr6    200000  300000  -7.06231    48.59131    42.51795    -12.05881   46.05725    -2.91815    -4.64475    6.15132 -1.01923    2.7069

It looks like RT pc values in NT_PT_RT matched in PT_RT but not matched inNT_TT (RT+PT).

conditions  RT1_100000  RT2_100000  RT3_100000  RT4_100000  RT5_100000
NT_PT_RT    2.74258 -5.85739    8.86721 -1.87691    4.20834
NT_TT   37.81121    34.08116    40.03612    31.64915    45.43659
PT_RT   2.74258 -5.85739    8.86721 -1.87691    4.20834

My input files are input.NT_PT_RT.txt

/data/kun/Lava/dcHic_results/data/res100k/NT1_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/NT1_100000_abs.bed    NT1_100000      NT
/data/kun/Lava/dcHic_results/data/res100k/NT2_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/NT2_100000_abs.bed    NT2_100000      NT
/data/kun/Lava/dcHic_results/data/res100k/PT1_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT1_100000_abs.bed    PT1_100000      PT
/data/kun/Lava/dcHic_results/data/res100k/PT2_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT2_100000_abs.bed    PT2_100000      PT
/data/kun/Lava/dcHic_results/data/res100k/PT3_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT3_100000_abs.bed PT3_100000 PT
/data/kun/Lava/dcHic_results/data/res100k/PT4_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT4_100000_abs.bed    PT4_100000      PT
/data/kun/Lava/dcHic_results/data/res100k/PT5_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT5_100000_abs.bed    PT5_100000      PT
/data/kun/Lava/dcHic_results/data/res100k/RT1_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT1_100000_abs.bed    RT1_100000      RT
/data/kun/Lava/dcHic_results/data/res100k/RT2_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT2_100000_abs.bed    RT2_100000      RT
/data/kun/Lava/dcHic_results/data/res100k/RT3_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT3_100000_abs.bed    RT3_100000      RT
/data/kun/Lava/dcHic_results/data/res100k/RT4_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT4_100000_abs.bed    RT4_100000      RT
/data/kun/Lava/dcHic_results/data/res100k/RT5_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT5_100000_abs.bed    RT5_100000      RT

input.NT_TT.txt

/data/kun/Lava/dcHic_results/data/res100k/NT1_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/NT1_100000_abs.bed    NT1_100000      NT
/data/kun/Lava/dcHic_results/data/res100k/NT2_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/NT2_100000_abs.bed    NT2_100000      NT
/data/kun/Lava/dcHic_results/data/res100k/PT1_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT1_100000_abs.bed    PT1_100000      TT
/data/kun/Lava/dcHic_results/data/res100k/PT2_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT2_100000_abs.bed    PT2_100000      TT
/data/kun/Lava/dcHic_results/data/res100k/PT3_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT3_100000_abs.bed PT3_100000 TT
/data/kun/Lava/dcHic_results/data/res100k/PT4_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT4_100000_abs.bed    PT4_100000      TT
/data/kun/Lava/dcHic_results/data/res100k/PT5_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT5_100000_abs.bed    PT5_100000      TT
/data/kun/Lava/dcHic_results/data/res100k/RT1_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT1_100000_abs.bed    RT1_100000      TT
/data/kun/Lava/dcHic_results/data/res100k/RT2_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT2_100000_abs.bed    RT2_100000      TT
/data/kun/Lava/dcHic_results/data/res100k/RT3_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT3_100000_abs.bed    RT3_100000      TT
/data/kun/Lava/dcHic_results/data/res100k/RT4_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT4_100000_abs.bed    RT4_100000      TT
/data/kun/Lava/dcHic_results/data/res100k/RT5_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT5_100000_abs.bed    RT5_100000      TT

input.PT_RT.txt

/data/kun/Lava/dcHic_results/data/res100k/PT1_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT1_100000_abs.bed    PT1_100000      PT
/data/kun/Lava/dcHic_results/data/res100k/PT2_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT2_100000_abs.bed    PT2_100000      PT
/data/kun/Lava/dcHic_results/data/res100k/PT3_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT3_100000_abs.bed PT3_100000 PT
/data/kun/Lava/dcHic_results/data/res100k/PT4_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT4_100000_abs.bed    PT4_100000      PT
/data/kun/Lava/dcHic_results/data/res100k/PT5_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/PT5_100000_abs.bed    PT5_100000      PT
/data/kun/Lava/dcHic_results/data/res100k/RT1_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT1_100000_abs.bed    RT1_100000      RT
/data/kun/Lava/dcHic_results/data/res100k/RT2_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT2_100000_abs.bed    RT2_100000      RT
/data/kun/Lava/dcHic_results/data/res100k/RT3_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT3_100000_abs.bed    RT3_100000      RT
/data/kun/Lava/dcHic_results/data/res100k/RT4_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT4_100000_abs.bed    RT4_100000      RT
/data/kun/Lava/dcHic_results/data/res100k/RT5_100000.matrix     /data/kun/Lava/dcHic_results/data/res100k/RT5_100000_abs.bed    RT5_100000      RT
ay-lab commented 1 year ago

It may be due to the wrong PC selection process (I guess that you ran the dchic --select step each time). dcHiC uses gc content and TSS correlation to select PC and assign compartments. Seems like for NTTT the PC selected for RT* samples is different than other combinations. When you ran the dchic --select step it should have generated a bunch of files that wrote down the PC (PC1/2/3) used for compartment score. I think if you look at the RT chr6 PCs for NT_PT_RT and PT_RT samples you will see that they are the same but for NT_TT it will be different. The solution is easy, try to use the reselectpc.r code under utility folder to reselect the PC as that of chr6 NT_PT_RT and PT_RT samples. Like the following -

Rscript reselectpc.r --reselect man --sample _pca --chr chr6 --pc --pctype intra

Unless, NT_TT(PT+RT) means something biologically different. Happy to help.

KunFang93 commented 1 year ago

Got it! I re-run NT_PT_RT group and skipped dchic --select for NT_TT. And I got the same PC between these two~