How to generate "-repParams" file

haizi-zh commented 3 years ago

Hello,

Would you please tell me how to generate the -repParams file?

In this issue (https://github.com/ay-lab/dcHiC/issues/2) you mentioned:

In this case, you can simply use multiple allValidPairs data and specify a pre-trained file with "-repParams" in the dchic.py call.

I took a look into both pre-trained files, https://github.com/ay-lab/dcHiC/blob/master/files/humanparams.txt and https://github.com/ay-lab/dcHiC/blob/master/files/miceparams.txt, and I guess the data is about the compartment "fluctuation" in each chromosome, is that correct? However, I still don't understand what the "m" and "s" columns are about.

In my project, the compartment profiles are not traditionally generated from Hi-C maps, but inferred from other epigenetic marks (this is what my project is about). Therefore, I prefer not to directly use the pre-trained files your project provide, but rather generate my own ones. Would you please let me know the meaning of the "m" and "s" columns, and how to generate them?

Thanks!

ay-lab commented 3 years ago

Hi There,

Thank you for the question. If I understand correctly, you wish to generate compartment profiles without MFA and then use the differential calling segment of our pipeline. Firstly, I would say this may be a bit risky because the MFA normalization is intended to make the compartments comparable. If you wish to do it, however, it should be possible—although I would definitely recommend that you run the dcHiC pipeline completely first to get a sense of what files/formats/directories you will need to put in place.

For the parameter text files you reference: dcHiC determines the biological relevance of differential compartments by estimating the variability of PC values between user-defined replicate cell lines and then using those variability parameters to determine which differential compartments are of biological significance. The precise methodology is defined in our paper, but briefly it goes as follows:

Pairwise comparisons of replicate (as defined by the user) PC values are taken, and linear models are fit, like this:
The distance from each point to the center line is measured and the "mean" and "standard deviation" ("m" and "s") of these distance values are calculated.
These distances define a biological variation score that is used as a covariate alongside p-values from a multivariate Gaussian in an Independent Hypothesis Weighting. This outputs final FDR-adjusted p-values for each compartment.

haizi-zh commented 3 years ago

Thanks for the excellent explanation!

ay-lab / dcHiC

How to generate "-repParams" file #7