deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

KeyError when hicAggregateContacts #339

Closed QianzhaoJ closed 5 years ago

QianzhaoJ commented 5 years ago

I want to plot one loop which were called by Juicer using hicAggregateContacts . But when I run

hicAggregateContacts --matrix ../data/WSKAT7_10000_norm.h5 --BED ./loop_test.bed \ --outFileName kat7_loop \ --range 500000:1000000 --numberOfBins 50 --chromosomes chr4 \ --avgType mean --transform obs/exp

Some errors occured:

INFO:hicexplorer.hicAggregateContacts:checking range 500000-1000000 INFO:hicexplorer.hicAggregateContacts:Computing observed vs. expected matrix. This may take a while. INFO:hicmatrix.HiCMatrix:processing chromosome chr4 INFO:hicexplorer.hicAggregateContacts:processing chr4 WARNING:hicexplorer.hicAggregateContacts:No valid submatrices were found for chrom: chr4 Traceback (most recent call last): File "/data1/liuzunpeng/04_Softwares/anaconda/bin/hicAggregateContacts", line 7, in main() File "/data1/liuzunpeng/04_Softwares/anaconda/lib/python2.7/site-packages/hicexplorer/hicAggregateContacts.py", line 638, in main format(over_1_5, float(over_1_5) / len(chrom_matrix[chrom]))) KeyError: 'chr4'

I guess there are some problems in my loop_test.bed ,but I don't know what to do. Can you give some suggestions ? My loop_test.bed

chr4 19980000 19985000 chr4 20250000 20255000

The raw loop file in juicer :

chr1 x1 x2 chr2 y1 y2 name score strand1 strand2 color observed expectedBL expectedDonut expectedH expectedV fdrBL fdrDonut fdrH fdrV numCollapsed centroid1 centroid2 radius 4 19980000 19985000 4 20250000 20255000 . . . . 0,255,255 27 4.035418 3.906886 3.897664 5.274288 5.300583e-09 6.151308e-11 1.004494e-10 3.558812e-07 3 19985833 20250833 3727

Thanks for your time ! Best Qianzhao

joachimwolff commented 5 years ago

WARNING:hicexplorer.hicAggregateContacts:No valid submatrices were found for chrom: chr4 and format(over_1_5, float(over_1_5) / len(chrom_matrix[chrom]))) KeyError: 'chr4'

In explanation: You want to use a region on chr4 for the detection but the matrix you use contains no data for chr4. And this causes a crash. To fix it please use a matrix containing data for chr4, maybe the annotation in you matrix is different and you need to search for 4 instead chr4.

QianzhaoJ commented 5 years ago

But when I run hicConvertFormat -m ../data/WSKAT7_10000_norm.h5 -o WSKAT7_10000_norm --inputFormat h5 --outputFormat ginteractions and view the result grep '^chr4' WSKAT7_10000_norm.tsv |less ,I find chr4 in the matrix , just like

chr4 10000 20000 chr4 10000 20000 176.789978027 chr4 10000 20000 chr4 20000 30000 19.8356132507 chr4 10000 20000 chr4 30000 40000 6.8191280365 chr4 10000 20000 chr4 40000 50000 4.73732614517 chr4 10000 20000 chr4 50000 60000 2.52131295204 chr4 10000 20000 chr4 60000 70000 3.62856006622 chr4 10000 20000 chr4 70000 80000 3.41786909103 chr4 10000 20000 chr4 80000 90000 11.6719551086 chr4 10000 20000 chr4 110000 120000 5.89985084534 chr4 10000 20000 chr4 120000 130000 13.3835811615

and when I try to search for 4instead chr4 by editing my loop_test.bed

4 19980000 19985000 4 20250000 20255000

the errors occured too

INFO:hicexplorer.hicAggregateContacts:checking range 500000-1000000 INFO:hicexplorer.hicAggregateContacts:Computing observed vs. expected matrix. This may take a while. INFO:hicmatrix.HiCMatrix:processing chromosome chr4 Traceback (most recent call last): File "/data1/liuzunpeng/04_Softwares/anaconda/bin/hicAggregateContacts", line 7, in main() File "/data1/liuzunpeng/04_Softwares/anaconda/lib/python2.7/site-packages/hicexplorer/hicAggregateContacts.py", line 657, in main cluster_ids[chrom] = [range(len(chrom_matrix[chrom]))] KeyError: 'chr4'

So perhaps the KeyError is for some other reason ?

joachimwolff commented 5 years ago

Hi,

the KeyError is occurring because there is not no data for chr4 in the matrix but we do some computations and the result of them is that there is too less data for chr4 to get a good result.

We investigate the issue in detail atm and I hope we can give you soon a more detailed explanation.

Best,

Joachim

gtrichard commented 5 years ago

Your loop_test.bed is containing these regions:

chr4    19980000 19985000
chr4    20250000 20255000

They are 275kb apart but you look for submatrices located in that range: --range 500000:1000000

Since the submatrix center is out of the range you want to look at, I guess that is why hicAggregateCotnacts is crashing. Please try another range like --range 100000:1000000

Also, you might consider that hicAggregateContacts is made to plot the mean or median of several submatrices. If you want to plot a loop that is not so far from the diagonal, please consider hicPlotMatrix or hicPlotTADs or even pyGenomeTracks:

https://github.com/deeptools/pyGenomeTracks

gtrichard commented 5 years ago

Any update @QianzhaoJ ?

QianzhaoJ commented 5 years ago

Yes , I tried a smaller range, and it works ! Thanks a lot ! Best Qianzhao