Closed KunFang93 closed 1 year ago
I found that there are files named _differential.intra_sample_chrXXcombined.pcQnm.bedGraph in DifferentialResult/PT_RT_100000/fdr_result folders. I wondered if I can use the values in these bedGraph to infer Compartment A/B for each samples? For example,
(dchic) [kun@G1400PNG-AP02LP fdr_result]$ head differential.intra_sample_chr1_combined.pcQnm.bedGraph
chr start end PT1_100000 PT2_100000 PT3_100000 PT4_100000 PT5_100000 RT1_100000 RT2_100000 RT3_100000 RT4_100000 RT5_100000 PTRT replicate_wt sample_maha pval
chr1 0 100000 **0.19785** **-0.63275** 0.23013 0.23188 0.46472 0.59767 -0.15075 0.72631 0.31756 0.69927 0.098366 0.438012 17.0283230046668 0.0665091336067877 0.796488987359423
chr1 100000 200000 -0.15469 -0.57651 0.05701 -0.12082 0.31452 0.67771 -0.32058 0.69927 0.28352 0.48196 -0.096098 0.364376 14.6515358501899 0.321070738534486 0.570964879780533
chr1 200000 300000 -0.11972 -0.44822 0.31555 -0.06539 0.65033 0.49692 -0.09224 0.65033 0.39607 0.48794 0.06651 0.387804 23.7918212822259 0.0481633873177334 0.826290501120561
chr1 500000 600000 0.01092 -0.58774 0.45031 0.03114 0.5769 0.6126 -0.06539 0.7413 0.61868 0.58627 0.096306 0.498692 17.6671098579832 0.166739999639082 0.683025475145062
chr1 600000 700000 -0.06135 -0.60609 0.28428 0.10637 0.57016 0.50297 -0.23014 0.80647 0.51995 0.67938 0.058674 0.455726 13.5847586113175 0.153437037901881 0.695272204414064
chr1 700000 800000 0.71038 -0.24246 0.91601 0.59576 1.06156 1.365 0.40538 1.35958 1.17846 1.24159 0.60825 1.110002 6.86424456943585 0.611888238837095 0.434077739403898
chr1 800000 900000 1.22411 -0.30698 1.42875 0.88449 1.34581 1.57203 0.65253 1.50606 1.44276 1.71208 0.915236 1.377092 3.25609257789608 0.47066216300425 0.492682700247561
chr1 900000 1000000 1.70949 -0.19608 1.74719 1.37552 1.63569 1.83472 1.28828 1.66303 1.62553 2.01513 1.254362 1.685338 2.54184234207455 0.404123603910036 0.524967303293424
chr1 1000000 1100000 1.59568 -0.2504 1.62885 1.4257 1.77502 2.03347 1.01516 1.67079 1.60472 1.65377 1.23497 1.595582 1.92957884015547 0.174371347027042 0.676255718773047
chr1 0 100000 for PT1_100000 is a compartment A since this bin has positive pc value 0.19785 while the next bin chr1 100000 200000 has a negative pc value -0.15469? Thanks for your help!
Best, Kun
Hi Kun
You can use the differential.intra_sample_chrXX_combined.pcOri.bedGraph file to extract the compartments. The pcQnm files represent the quantile normalized compartment scores which are only used to compare the scores across samples and to derive the significance internally. The pcOri files represent the original scores that represent A(+ve values) and B(-ve values) compartments.
Got it. Thanks for your prompt reply! I am fresh to compartment analysis, so please forgive me if this is a dumb question : for counting the number of differential compartment, do we count the number of bins with padj less than cut off? Or we combine bins with same sign of pc value first as a compartment and then check if there is any bin in the combined region has the padj less than cut off, and finally count the number of differential bins? From the Fig.3A in the paper, I guess it would be the first one, counting bins? But how about count the number of compartment A/B, do I need to combine bins with same pc sign first and then count their number? Thanks for your help and time again~
This is a very interesting question. Given how we formulated the problem which is to compare the compartment score of a Hi-C bin across multiple samples, we needed to find the padj values for each bin separately. Combining the adjacent bins with the same compartment scores within each sample will certainly not give you an equal-sized region for proper comparison across multiple samples. So, to find the differential compartment we count the number of bins with padj less than cut off.
The second option that you suggested is more likely to give you a biologically interesting region. A significantly different region can be part of a continuous stretch of either A or B region where the other bins may not pass the padj threshold. Such cases may reflect a gradual change in the compartment scores across samples. For example, you can look at Fig. 2K Dach1 region in the paper. In the NPC sample, the PC value gradually changes from B to A and so thus the padj values. At some point, it crosses the Padj threshold and we call it significant.
Got it, it makes a lot of sense. Thanks for your help!
Hi Kun
You can use the differential.intra_sample_chrXX_combined.pcOri.bedGraph file to extract the compartments. The pcQnm files represent the quantile normalized compartment scores which are only used to compare the scores across samples and to derive the significance internally. The pcOri files represent the original scores that represent A(+ve values) and B(-ve values) compartments.
Question about this: I realize that there are files with "intra_chr#_combined.pcOri.bedGraph" for individual chromosomes, as well as "differential.intra_sample_group.Filtered.pcOri.bedGraph". Is the latter the merged file of all individual chromosomes that is then filtered with p-value cut off? What is the recommended or default p-value/p-adj cutoff? Thank you so much!
Hi,
I would like to get compartment A/B information for each sample. I noticed that in xx_resolution/intra_pca/sample_res_mat folder, each chromosome has 10 files, e.g.,
I wondered how could I extract compartment A/B information from these files? Thanks for your help!
Best, Kun