ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
58 stars 10 forks source link

dcHiC result #1

Closed BenxiaHu closed 3 years ago

BenxiaHu commented 3 years ago

Hi, dcHiC looks a good tool to analyze differential compartments. Have you compared dcHiC with other tools counterpart? Best,

ay-lab commented 3 years ago

Hi. A preprint is coming soon with detailed results. In terms of comparisons, we are not aware of a tool that does multi-way (>2 cell types) comparison of compartments. For pairwise comparisons, our method is finding all differences reported before which are mainly compartment flips as well as changes within a compartment (high A -> low A).

BenxiaHu commented 3 years ago

thanks. I just run the dcHiC, and otained the result. Would you explain what the case & ctrl columns mean?

chr start end case ctrl mdist dZsc pval padj

chr14 32800000 32900000 1.39196764497679 1.98860321563424 47.4894458754581 3.08855964842099 5.53008806504725e-12 8.33275306022834e-10 chr14 54600000 54700000 0.902929811279385 0.23864960511371 55.6862079152236 3.38604034058415 8.50126228736289e-14 1.91718553099108e-11 chr14 54800000 54900000 -0.0957086304259123 -0.706029881143173 37.5171340209211 3.08855964842099 9.06134744770622e-10 7.55698725042207e-08

Best,

ay-lab commented 3 years ago

The columns between "chr start end" and "mdist dZsc pval padj" in the differential compartment file are the PC values. I can't see your input file, but those should be the average PC values for the "case" replicates and "ctrl" replicates.

BenxiaHu commented 3 years ago

got it. thanks a lot.

BenxiaHu commented 3 years ago

the number of differential compartments is about 100. Do you think the low sequencing depth causes that? the validpairs for each library is about 100M. or how to increase the number of differential compartments?

or, can I just use the PC score to define compartment switching? for example, the PC score in case is greater than 0, while the PC score in control is less than 0, so this compartment undergoes swithcing.

ay-lab commented 3 years ago

Is it 100 for the entire genome or for a specific chromosome? Sequencing depth shouldn't affect results; we have done testing with this. If you just want to find compartment switches, using PC score is fine. If you want to extract significant changes in all scenarios (so compartment changes and A-->A, for instance), beyond merely changing the p-value threshold, here are a few things I can think of:

1) Visualize your data to see if there's an issue (such as large structural variations) there. This can be done with one line: Rscript igvtrack.r [Full Diff Compt Details bedGraph] [genome]. It will create a folder with an interactive HTML where you can browse results.

2) Double check that your input has groups of replicates. Check the file "chrdistances.txt," which shows the mean and standard deviation of PC value variation between replicates. dcHiC finds significant compartments by using these parameters, so if they are excessively high (mean, stdev > 0.4, for instance), that could indicate that the replicates don't have very similar PC values.

BenxiaHu commented 3 years ago

thanks for your reply. about 100 differential compartments are detected in the entire genome. only the stdev in chr21 is greater than 0.4, which indicates that the replicates in other chromosomes should befine.

ay-lab commented 3 years ago

I see. Could you share any data/result(s) with us, perhaps the compartment detail bedGraphs and IGV html page, here or by email (jeffreywang@lji.org)?

BenxiaHu commented 3 years ago

thanks. this dataset has not been published. when I ran dcHiC on another Hi-C dataset, I got this error: Positions Not Shared Across All Data Sets, Chromosome 6:

None 4167

Rscript --vanilla /nas/longleaf/home/software/dcHiC/dchic/Hier.R
BalancedChrMatrix_exp_case.1.txt BalancedChrMatrix_exp_case.2.txt BalancedChrMatrix_exp_case.3.txt BalancedChrMatrix_exp_case.4.txt BalancedChrMatrix_exp_ctrl.1.txt BalancedChrMatrix_exp_ctrl.2.txt BalancedChrMatrix_exp_ctrl.3.txt BalancedChrMatrix_exp_ctrl.4.txt 4 4 case.1 case.2 case.3 case.4 ctrl.1 ctrl.2 ctrl.3 ctrl.4 case ctrl 2 8 4167 40000 chr6 hg19 ./dcHiC/hg19_goldenpathData 2 /nas/longleaf/home/softwaredcHiC/dchic

R OUTPUT: R function done! Traceback (most recent call last): File "/nas/longleaf/home/software/dcHiC/dchic/run.py", line 318, in dmfa_file = open(name, "r") FileNotFoundError: [Errno 2] No such file or directory: 'hmfa_chrRAW_case.1_exp_1.txt' [1] "/nas/longleaf/home/software/dcHiC/dchic"

when dcHiC analyzed chr20,21,22, I did not got the aforementioned error.

ay-lab commented 3 years ago

Hmmm. To be clear, with chromosomes 20, 21, and 22, everything in dchic.py ran to the end? And in chromosome 6 this unique error comes up?

Without any data, it is difficult to make a diagnosis. However, this error is where MFA in R is being called. Usually, after "R OUTPUT:" there are several lines of system output from the MFA R script.

One big possibility is that you are running out of memory mid-way (PCA memory demand scales with size), which is why it works for chromosomes 20-22 but not for 6. I did a test and ran into an error in the same spot when limiting my RAM capacity.

If that's not it, running the R script ("Rscript --vanilla /nas/longleaf/home/software/dcHiC/dchic/Hier.R ..... ") output standalone and seeing what the error output is will help.

BenxiaHu commented 3 years ago

thanks. not only chr6, but aslo other large chromsomes, such as chr1, chr2 show the same error. yes, when I ran "Rscript --vanilla /nas/longleaf/home/software/dcHiC/dchic/Hier.R ..... " in the terminal, Hier.R outputs results. so how to deal with this issue? I set the memory to 800GB, I still got the same errors for big chromosomes.

ay-lab commented 3 years ago

Hello,

If Hier.R works fine run alone and smaller chromosomes work fine, I don't think this an error with dcHiC. Are you running the large chromosomes in parallel? Perhaps the multi-threading runs into a system limit because the 40kb matrices are already huge. Running in sequence (omitting the -parallel option) or just running everything at a lower resolution (100kb has been our default mode of analysis) may help.

BenxiaHu commented 3 years ago

thanks a lot.