deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

Questions about non-homogeneous bin size, hicPlotMatrix with different normalization values within .cool/.mcool #254

Closed kalavattam closed 6 years ago

kalavattam commented 6 years ago

Dear HiCExplorer team,

I'm using version 2.1.3 (Python 3.6 environment).

Can you help me to interpret 'WARNING:hicexplorer.HiCMatrix:Bin size is not homogeneous' printed to stderr from running hicInfo? Sometimes, I see this with some .h5 matrices (at certain binned resolutions (e.g., 100 Kb, 250 Kb)) that came from .cool files (via hicExport; the .cool files came from hic2cool (version 0.4.1)). Also, sometimes, hicInfo says the bin size is not homogenous for certain binned resolutions for .h5 files but not for the .cool files that they were derived from.

And one more question: Is it possible to call hicPlotMatrix on particular normalization vectors within .cool or .mcool matrices that came from .hic files via hic2cool? For example, can I call hicPlotMatrix with some version of the 'filename.cool::node/info' filename input to do this?

Thanks! Kris

joachimwolff commented 6 years ago

Dear Kris,

thanks for using HiCExplorer.

The warning means that not all bins have the same size in your binned contact matrix. It can happen that e.g. the end of a chromosome does not match the bin size. To give you a hint on this we print out the median of bins sizes. Is the median very close to the expected bin size? If yes this would be an argument for the non-fitting bin size at the end of a chromosome. However, the warning should occur for identical data independent of its storage format. Maybe you can send me an example matrix where it happens with h5 but not with cool?

Your second question: You can open a node of a m/cooler file by specifying the exact path. If you don't know the path, don't worry, HiCExplorer will give you an information about which nodes are available.

I hope I could help and don't hesitate to ask more questions.

Best,

Joachim

kalavattam commented 6 years ago

Joachim,

Thanks for your quick and helpful advice. After spending some time to troubleshoot my first question, I realized that the discrepancy between .h5 and .cool files arose from an error I made.

With regards to my second question, cooler and multi-res cooler files derived from hic files via hic2cool contain multiple normalization vectors (for example, one for the Lieberman lab's vanilla coverage, one for the Knight-Ruiz normalization, etc.). Is it possible to use HiCExplorer to access these cooler/multi-res cooler files and plot a matrix with respect to a particular form of normalization? For example, to access a multi-resolution cooler file, choose the 25-kb binned resolution, and then plot a Knight-Ruiz normalized version?

Thanks! Kris

joachimwolff commented 6 years ago

Dear Kris,

This is at the moment not possible. I have to fix a cooler issue today anyway, maybe I will add this feature too and release a new bug fixed version.

Best,

Joachim

joachimwolff commented 6 years ago

Dear Kris,

I implemented your request for a few classes. You can use the branch 'cooler_correction_patch' to test it. The script 'hicInfo' gives you information about which correction factors are stored and what their column name is. You can than call e.g. the plotting with `hicPlotMatrix --matrix foo_hic.multi.cool::/resolutions/1000000 --outFileName cooler_fix_kr_div.png --log1p --correction_name KR --correction_operation '*'

I apply the correction factors by calculating New[i,j] = data[i,j] * correction[i] * correction[j]. In some cases the correction factors expect not a multiplication but a division, therefore I added the option --correction_operator which accepts '/' or '' to change it to `New[i,j] = data[i,j] / correction[i] correction[j]. If no correction should be applied set--correction_name toNone`. Useful if you want to use e.g. an unbiased correction with hicCorrectMatrix.

Best,

Joachim

kalavattam commented 6 years ago

Thanks, Joachim! I'll test it out and let you know how it goes.

-Kris

kalavattam commented 6 years ago

Hi Joachim,

I apologize for the delay on testing the patch. I've run into some Python environment issues that I am still working out.

In the meantime, I wanted to ask about another feature. To normalize for read depth between samples from different biological conditions, I want to down-sample matrices with more reads than others so that the sums of all normalized matrices equal the same value. Also, I'd like to down-sample matrices to test the robustness of TAD calling. Is there a means to do this through HiCExplorer (I apologize if I have missed this)? If not, can I request this feature, or do you have any recommendation on other means to do this?

Thanks again; the HiCExplorer software has been very helpful for my research, Kris

joachimwolff commented 6 years ago

Dear Kris,

I am glad to hear that HiCExplorer is supporting your research. For Python environment issues I can recommend conda environments.

In the meantime, I wanted to ask about another feature. To normalize for read depth between samples from different biological conditions, I want to down-sample matrices with more reads than others so that the sums of all normalized matrices equal the same value

This is not possible with HiCExplorer so far, I will add this feature to my ToDo list.

Also, I'd like to down-sample matrices to test the robustness of TAD calling

I do not understand what you want to do here. You can use HiCExplorer to change the resolution of the matrix (hicMergeMatrixBins) if you mean this with 'downsampling', otherwise can you please explain in more detail what do you want to do?

Best,

Joachim

kalavattam commented 6 years ago

Thanks, Joachim. I meant reducing the sum of normalized matrices and testing TAD calling with respect to the new values.

-Kris

vivekbhr commented 6 years ago

@kalavattam under the script directory in HiCExplorer, we have a program called hicComputeSaturation.py, this is a wrapper that uses HiCexplorer API to downsample the matrics to different depths and call TADs. You can try that

kalavattam commented 6 years ago

@joachimwolff and @vivekbhr, thanks for your programs and advice. I set up an environment to test the cooler_correction_patch via pip (trying to plot KR-normalized contact maps from a multi-cool file (from hic2cool)), but I'm getting some TraceBack Errors regardings pyBigWig.so and glibc 2.14; when I import a glibc 2.14 module, then I get Segmentation Faults. I'm not sure if you have any advice about this--I will discuss this with my system admins too.

ImportError: /lib64/libc.so.6: version 'GLIBC_2.14' not found (required by /users/ala1zp/anaconda3/envs/py27_hicexp_cooler_correction_patch_env/lib/python2.7/site-packages/pyBigWig.so)

I am also testing .cool files balanced with cooler in a hicexplorer 2.1.4 environment. These jobs seem to complete successfully, but my .png output show no contacts. I show how I call the command below, and I attached my shell scripts, stderr, and stdout; if possible, can you point out if I am doing anything wrong or provide any advice?

Thank you!

$   hicPlotMatrix \
    --matrix pooled_100kb-res.cool \
    --title pooled_100kb-res_v2-1-4_cool_test \
    --chromosomeOrder chr1 \
    --perChromosome \
    --outFileName pooled_100kb-res_v2-1-4_cool_test.png \
    --colorMap RdYlBu_r \
    --log1p \
    --vMax 1000 \
    --dpi 300

pooled_100kb-res_v2-1-4_cool_test

hicPlotMatrix_env-py27-v2-1-4_cool-bal_pooled_chr1_100kb_061118.sh.675706.err.txt hicPlotMatrix_env-py27-v2-1-4_cool-bal_pooled_chr1_100kb_061118.sh.675706.out.txt hicPlotMatrix_env-py27-v2-1-4_cool-bal_pooled_chr1_100kb_061118.sh.txt

joachimwolff commented 6 years ago

Hi Kris,

indeed, this plot looks not how it should be, there is some data on the diagonal if you look close. It seems that the log1p is not applied, but I can see you used this parameter. Anyhow, in my experience data from .hic files i.e. from juicer looks always a bit too 'clean' i.e. there are less to none contacts on the long range. Maybe it helps to remove the vmax value? How does the data look if you zoom in a bit? Moreover, I think there is no correction applied, maybe you can use our hicCorrectMatrix function to do so? And a last thought: Can you confirm this is not the way how a plot should look like (from a publication)? What do quality reports say? Maybe the data is just bad.

Concerning your first issue: You can try the following:

I hope I could help you a bit.

Best,

Joachim

gtrichard commented 6 years ago

By the way, isn't --vMax 1000 too high? How is the plot looking with --vMax 100?

fidelram commented 6 years ago

I suggest using vMax = 50

On Tue, Jun 12, 2018 at 2:02 PM Gautier RICHARD notifications@github.com wrote:

By the way, isn't --vMax 1000 too high? How is the plot looking with --vMax 100?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/254#issuecomment-396564992, or mute the thread https://github.com/notifications/unsubscribe-auth/AEu_1fWnac_FTQs0H_3KmGmUD9WLGY4mks5t763mgaJpZM4UIj31 .

kalavattam commented 6 years ago

Thanks @joachimwolff and all for your explanations and advice. I have the cooler_correction_patch branch working now and the data look consistent between Juicer and HiCExplorer.

Update: Thanks again, I've resolved all issues/questions!