Open Nuturetree opened 6 months ago
Using the input file in .cool format should be sufficient. I recommend trying lower resolutions, such as 40kb. If the issue persists, please provide screenshots of any warning messages from DiffDomain and of the output file displaying all columns.
Thank you for your reply, the last issue has been resolved, which result from our err option. But there is a new problem that has arisen. When I use raw matrix as input I get an error with the following message:
python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.matrix J668_Fov7_720min_Ghjin_D05.matrix Ghjin_D05_TAD_region.bed --reso 20000
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts['
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in
I'm guessing this could be a problem with the matrix to cool or hic conversion process, but am not sure exactly why. So I called hicexplorer to convert to cool format and then do the calculations, but it also reported an error.
python /public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.cool J668_Fov7_720min_Ghjin_D05.cool Ghjin_D05_TAD_region.bed --reso 20000
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, kwds))
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts['
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in
This is my input file Glr19_mock_0min_Ghjin_D05.zip Glr19_Fov7_720min_Ghjin_D05.zip Ghjin_D05_TAD.zip
I'm guessing if it's due to unequal lines in the input file
This is the number of hic for the same material at different times
Since the two matrix matrices did not contain bin interactions equal to 0, the rows were not equal, which made it impossible to compare the two matrices after generating the cool. I used a script in hicpro to generate a symmetric N*N matrix and converted the lower triangular matrix to three columns "bin1 bin2 reads" and then used hicexplorer to convert the matrix to cool and then compared them but the fourth and fifth columns did not have any values. Script: python ~/biosoft/HiC-Pro-master/bin/utils/sparseToDense.py ${out_dir}/${c}.matrix -o ${out_dir}/${c}_Symmetries.matrix Get lower triangular interactions def extract_lower_triangle(df).
N = df.shape[0]
# Get the row and column indices of the lower triangle
lower_tri_indices = np.tri_indices(N)
# Extract the values from the DataFrame using these indices
row_indices = lower_tri_indices[0]
col_indices = lower_tri_indices[1]
values = df.values[lower_tri_indices] # Extract values from the DataFrame
# Create a new DataFrame to store these values with proper labelling
lower_tri_df = pd.DataFrame({ 'Row': row_indices] # Create a new DataFrame to store these values with proper labelling
'Row': row_indices + 1, # Convert to 1-based indexing
'Column': col_indices + 1, # Convert to 1-based indexing
'Value': values
})
return lower_tri_df
Convert to cool hicConvertFormat -m ${mtx_f1} --bedFileHicpro ${abs_f} --inputFormat hicpro --outputFormat cool -o ${cool_f1} --resolutions 20000 run diffdomain python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple ${cool_f1} ${cool_f2} Ghjin_D05_TAD_region.bed --reso 20000
I can get results, but the fifth and sixth columns don't have any values. cool1,cool2, tad and results files in the attachment input_result.zip
Thank you for your reply, the last issue has been resolved, which result from our err option. But there is a new problem that has arisen. When I use raw matrix as input I get an error with the following message: python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.matrix J668_Fov7_720min_Ghjin_D05.matrix Ghjin_D05_TAD_region.bed --reso 20000 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in comp2domins_by_twtest_parallel(0) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 1 """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in result.append(i.get()) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value KeyError: 1
I'm guessing this could be a problem with the matrix to cool or hic conversion process, but am not sure exactly why. So I called hicexplorer to convert to cool format and then do the calculations, but it also reported an error.
Let's address the issue with this specific usage first. Thank you for providing the example data. Format in Glr19_mock_0min_Ghjin_D05.matrix
and Glr19_Fov7_720min_Ghjin_D05.matrix
does not meet the requirements of DiffDomain. In the three-column format input file for DiffDomain, the first two columns document the exact genomic locations (bin ID * reso
) of two bins in a chromatin interaction, similar to the outputs from straw
function.
For example, the first two lines in the Glr19_mock_0min_Ghjin_D05.matrix
should be
20000 20000 57
20000 40000 58
rather than
1 1 57
1 2 58
Please revise the format of the .matrix files and try this usage again. Kindly let us know if the issue is solved.
Thank you very much for your answer, I followed your suggestion on bin_id*reso, but still got the error message as below:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts['
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in
J668_Fov7_720min_Ghjin_D05_reso.zip the input matrix J668_mock_0min_Ghjin_D05_reso.zip
Really think diffdomain is a very useful tool, and thanks for your prompt reply!
A quick response first. There is a bug when loading three-column files as the input. Current version assigns every chromatin interaction in the input file for every TAD, which is wrong. For example, assigning the interaction 20000 20000 49
to the TAD Ghjin_D05 320000 500000
will obviously reaise the KeyError: 20000
, since bin 20000
does not belong to the TAD region 320000-500000
. A quick fix is on the way and will be uploaded soon.
Meanwhile, the input file in .hic
or .cool/.mcool
format is fine, because DiffDomain leverages straw
for .hic
file or fetch
for .cool/.mcool
files to first extract the subset of chromatin interactions that are within a given TAD.
We have fixed the bug in reading chromatin interactions with three-column sparse format. Please follow the instruction in Method1: to install the conda environment to install DiffDomain and rerun the command.
This version has been tested on Macos.
Code
python3 diffDomain/diffdomain-py3/diffdomains.py dvsd multiple J668_Fov7_720min_Ghjin_D05_reso.matrix J668_mock_0min_Ghjin_D05_reso.matrix Ghjin_D05_TAD_region.bed --reso 20000 --ofile test.tsv
Output
Full results here test.tsv.zip
Thank you for your reply, I would like to ask if the files are normalized (KR or ICE) when using three-column sparse format as input, as I found that a normalization COOL file is generated when using COOL as input, whereas there is no such file when using three-column sparse format as input!
Thank you for your reply, I would like to ask if the files are normalized (KR or ICE) when using three-column sparse format as input, as I found that a normalization COOL file is generated when using COOL as input, whereas there is no such file when using three-column sparse format as input!
The three-column sparse format is used as is, with no normalization performed by DiffDomain. .hic
format or .cool/.mcool
format are highly recommended for using normalized Hi-C interactions.
accroding your suggestion, I want to using the cool file as input file,but generated err: test_input.zip
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, kwds))
File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel
fhic0=opts['
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/diffdomains.py", line 76, in
the cool file generated by using the hicConvertFormat
Hi author: The diffDomain is a useful tool. But there seems to be something strange about my results, all the TADs have no p-values, is it possible that the input file should provide the original matrix not the iced matrix. thanks. ![Uploading Snipaste_2024-04-23_11-33-10.png…]()