Closed zhixuqiu closed 5 years ago
I'm also wondering if hic2cool applies matrix balancing normalization using cooler. I generated a .multi.cool file from a .hic file, and it includes all the normalizations that were present in the .hic file, plus a normalization called "default" which was not in the .hic file.
Cooler stores raw bin counts , and applies balancing weights "on the fly" as needed... Balancing weights are stored as part of bins
table. Check cooler schema for reference: https://cooler.readthedocs.io/en/latest/schema.html#data-collection
For example, balancing is automatically applied when viewing .mcool
in HiGlass , or fetching snippets of the matrix using cooler Python API : https://cooler.readthedocs.io/en/latest/api.html#cooler.Cooler.matrix
Thus hic2cool
behaviour is what you'd expect .
Also beware that .hic
usually carry a bunch of different balancing weights, so you'd get all of them in your cooler and by default - a default one is used - to use some other one use - look for specific option in cooler CLI commands , or do something balance="weightcol_name"
when fetching snippets of a matrix in your custom Python scripts
Hope this helps
Thank you @sergpolly for the explanation.
@DoaneAS, could you send me the command you used to interrogate your hic2cool .multi.cool
file? I don't see a normalization named "default" in bins
table of my output files, just the weights corresponding to .hic
normalization vectors (KR
, VC
, VC_SQRT
...)
Hi @DoaneAS. As @sergpolly pointed out, all normalization vectors are stored in the bin table, separately from the raw counts.
@carlvitzthum, I believe the "default" comes from HiGlass's transforms menu. HiGlass will default to applying the normalization vector called weight
if it exists (listed as ICE in HiGlass), otherwise the "default" normalization should be "None", i.e. raw counts.
weight
is the default name of the output from running cooler balance
, which does standard matrix balancing (same as KR, and the results should be about the same up to bin-level filtering and the value scale -- the default normalization done by cooler is genome-wide and rescales the weights such that the marginals sum to 1. However, all of this can be customized).
One notable difference is that cooler uses multiplicative weights. So the balanced values = count * weight1 * weight2
. hic vectors are divisive biases, so balanced = count / bias1 / bias 2
. hic2cool currently inverts the hic biases into multiplicative weights -- this is unfortunately inconsistent with HiGlass which expects KR,VC,VCSQRT to be divisive, not multiplicative. So if KR, etc. look funny in HiGlass, that would be the issue.
FYI @carlvitzthum will be deprecating this inversion behavior in the next version of hic2cool, and provide a way to re-extract the hic norms as divisive ones. In the meantime, you can try re-inverting those columns back manually in Python using h5py (I can send you a short script to do this), or just run cooler balance
on each zoom level.
I have just released version 0.5.0, which deprecates the inversion of hic vectors in the output cooler files. Please run hic2cool update <cooler filepath>
to get your files caught up. See the docs for more info.
Closing this issue. Please re-open a new one if something comes up when using the new version.
Hi, I converted my Hi-C matrix from heic format to cool format. But the reads counts of new cool file were original reads number not the normalization values. Could you tell me how to output normalization counts? My command is