ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
57 stars 10 forks source link

Some questions about Pre Processing Correlation Matrices #2

Closed biozzq closed 3 years ago

biozzq commented 3 years ago

Dear all,

Very nice tool. I would like to have a try in my own project. However, some questions about the input used in dcHiC confused me.

1, Why using validPairs interactions but not the allValidPairs (the main different between them should be that the duplication has been removed from allValidPairs) to generate the matrix? In my mind, the duplication should be removed.

2, If I want to process from .hic file, I think the .hic file should be generated using hicpro2juicebox.sh (this also used the allValidPairs but not the validPairs). Is this right?

3, Does fanc (https://github.com/vaquerizaslab/fanc) can be used to generate the inputs for dcHiC?

Best wishes, Zheng zhuqing

ay-lab commented 3 years ago

Hello,

1) The input asks for validPairs and not allValidPairs because dcHiC's differential calling "learns" the amount that PC (compartment) values vary between biological replicate datasets and uses those parameters for significance thresholds. In HiC-Pro, the allValidPairs file is a combined file with reads across different validPair replicate files.

If you don't have replicates or you wish to compare just the allValidPairs files between several cell lines, that is also possible. In the "files" folder under the root directory, we have also provided mice/human parameter files for differential calling which were trained on gold standard datasets. In this case, you can simply use multiple allValidPairs data and specify a pre-trained file with "-repParams" in the dchic.py call.

2) If I understand correctly, you have HiC-Pro results. If that is the case, we do not recommend you convert to .hic—simply generate O/E correlation matrices from HiC-Pro sparse matrix/bed files.

3) I am not familiar with fanc. However, dcHiC only requires a sparse matrix/corresponding bed file like that described at the bottom of this page to pre-process correlation matrices. It appears fanc dump may give you this result (or something close)?

We hope you find our tool useful! Please let us know if you have any other questions or other issues arise as you use it :).

biozzq commented 3 years ago

Dear @ay-lab

Thank you for your explain. I only generate the validpairs and allValidPairs files for each sample using Hic-Pro but not doing the build_contact_maps and ice_norm. Thus, I need first convert the validpairs to sparse matrix/bed files for each replicate by using buildmatrix, is this right?

More, if i also finshed running build_contact_maps and ice_norm, it will generate two matrix files for each resolution, one is raw and the other is balanced by ICE. Which matrix should I use in dcHiC?

Best wishes, Zheng zhuqing

ay-lab commented 3 years ago

So to get the dcHiC preprocessing result, do these steps:

biozzq commented 3 years ago

Dear @ay-lab

thanks for your information.

Best wishes, Zheng zhuqing