Closed FlavianR closed 12 months ago
To add, I tried to used dvsd multiple, like bellow:
python ../diffDomain/diffdomain-py3/diffdomains.p dvsd multiple data1.mcool data2.mcool TAD_list_data1.bed --reso 40000
In the TAD_list_data1.bed, I got all the domains find by the tool SpectralTADs on all the chromosoms. But I get an error :
Traceback (most recent call last):
File "/env/products/python/3.8.11/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/env/ig/soft/eco3d/diffdomain/2023.11.09/bin/../lib/python3.8/site-packages/diffdomain_py3/diffdomains.py", line 57, in comp2domins_by_twtest_parallel
tmp_res = comp2domins_by_twtest(chrn=tadb.iloc[i, 0], start=tadb.iloc[i, 1],
File "/env/export/v_info/q_soft/eco3d/diffdomain/2023.11.09/lib/python3.8/site-packages/diffdomain_py3/utils.py", line 357, in comp2domins_by_twtest
mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm)
File "/env/export/v_info/q_soft/eco3d/diffdomain/2023.11.09/lib/python3.8/site-packages/diffdomain_py3/utils.py", line 213, in contact_matrix_from_hic
mat = c.matrix(balance=True).fetch(f'{chrn}:{start}-{end}')
File "/env/ig/soft/eco3d/hicexplorer/3.7.2/lib/python3.8/site-packages/cooler/core/_selectors.py", line 149, in fetch
i0, i1, j0, j1 = self._fetch(*args, **kwargs)
File "/env/ig/soft/eco3d/hicexplorer/3.7.2/lib/python3.8/site-packages/cooler/api.py", line 392, in _fetch
region1 = parse_region(region, self._chromsizes)
File "/env/ig/soft/eco3d/hicexplorer/3.7.2/lib/python3.8/site-packages/cooler/util.py", line 178, in parse_region
raise ValueError(f"Genomic region out of bounds: [{start}, {end})")
ValueError: Genomic region out of bounds: [132160000, 133800000)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/env/ig/soft/eco3d/diffdomain/2023.11.09/bin/../lib/python3.8/site-packages/diffdomain_py3/diffdomains.py", line 76, in <module>
result.append(i.get())
File "/env/products/python/3.8.11/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: Genomic region out of bounds: [132160000, 133800000)
After some research, I understood that this error is because when I gave a domain you took on a region around to calculte the differential score. But when we are like this at the end of the chromsome it's doesn't work.
Moreover the error "The matrix is too spase or too small at this resolution to be calculated !" keep come back but I don't a mention about this kind of limitation.
If you can help me, I thank you in advance.
A response to the error "The matrix is too spase or too small at this resolution to be calculated !" keep come back but I don't a mention about this kind of limitation.
It is a warning message. It is uncommon that Hi-C data at a given resolution, particularly high-resolution say 10 kb, are sparse in some TADs. We use two parameters to deal with this situation: --min_nbin
and --f
. Briefly, after preprocessing and data imputation, the two Hi-C contact matrices for a particularly TAD has fewer number of columns/rows than user predefined cutoff (--min_nbin
), DiffDomain will assign a missing value as the P value and print the warning message "The matrix is too spase or too small at this resolution to be calculated !" Please see our description of the two parameters --min_nbin
and --f
here .
For a given resolution, if multiple warning messages prompt out, equivalently, many P values are missing values, it is a signal that a lower-resolution should be used instead. One extreme example is using 1 kb resolution Hi-C data to compare K562 TADs between K562 and KBM7 cell lines in Rao et al 2014 paper.
A response to the ValueError: Genomic region out of bounds: [132160000, 133800000).
Based on the description, my educated guess is that the reference genome for the Hi-C data data1.mcool data2.mcool
is different from the reference genome of the TAD list TAD_list_data1.bed
. Please kindly let us know if this is the case.
Thanks for your reponses!!
For the TAD_list_data1.bed, the reference genome is the same as both data in mcool file. It has been generated by using SpectralTADs and use with some other tools.
Co-author Ming Gu has another guess. It could be due to the rounding-up of the TAD region if the TAD covers the end of the chromosome. Please compare the chromosome size with 133800000, the end of the TAD region. If the chromosome size is smaller than 133800000, please modify the TAD region.
I will check if it's this. I will come back to you for an update on this.
And if you have a clue on the first issue i got, it would help :)
The check was pretty fast. My chromosome size is tell by hg38. And at the end of my TAD_List.bed for the chr10 the end of the domain is 133 720 000, which is less than 133 797 422.
I will check if it's this. I will come back to you for an update on this.
And if you have a clue on the first issue i got, it would help :)
Apologize! Forgot to answer this question. The reason behind the issue ValueError: zero-size array to reduction operation maximum which has no identity
is because certain k-
off diagonal entries of a Hi-C contact matrix of a particular TAD are all missing values or Infinity values.
Based on the description that .hic
file works fine, this happens only for .mcool
files is because the user specified normalized method, say KR, is not correctly called in extracting the contact matrices from.mcool
files. We will provide a quick fix. Please stay tuned.
Hi @FlavianR ,
Thank you for reaching out with your questions.
Regarding the first issue, the ValueError: zero-size array to reduction operation maximum which has no identity
, we have implemented a normalization process for .mcool and .cool files to correctly call user-specified correction methods such as KR.
As for the second question, ValueError: Genomic region out of bounds: [132160000, 133800000)
, we have updated the method for adjusting the upper bound of the TAD region according to a user-specific resolution.
You can access the latest version by updating with pip install diffDomain-py3 --upgrade
. Please let us know if this resolves the issues or if you have any further questions.
Best regards.
Thanks for this update!
Hello again!
I have an issue while I was testing dvsd one.
I use as follow :
And I obtain the error :
It's seem that this issue isn't visible when you use a hic file.
Thank you in advance for your answers.