Tian-Dechao / diffDomain

DiffDomain is a statistically sound method for detecting differential TADs between conditions
MIT License
14 stars 4 forks source link

output file #20

Open Nuturetree opened 6 months ago

Nuturetree commented 6 months ago

Hi author: The diffDomain is a useful tool. But there seems to be something strange about my results, all the TADs have no p-values, is it possible that the input file should provide the original matrix not the iced matrix. thanks. ![Uploading Snipaste_2024-04-23_11-33-10.png…]()

Nuturetree commented 6 months ago

Snipaste_2024-04-23_11-33-10

Tian-Dechao commented 6 months ago

Using the input file in .cool format should be sufficient. I recommend trying lower resolutions, such as 40kb. If the issue persists, please provide screenshots of any warning messages from DiffDomain and of the output file displaying all columns.

Nuturetree commented 6 months ago

Thank you for your reply, the last issue has been resolved, which result from our err option. But there is a new problem that has arisen. When I use raw matrix as input I get an error with the following message: python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.matrix J668_Fov7_720min_Ghjin_D05.matrix Ghjin_D05_TAD_region.bed --reso 20000 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in comp2domins_by_twtest_parallel(0) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 1 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in result.append(i.get()) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value KeyError: 1

I'm guessing this could be a problem with the matrix to cool or hic conversion process, but am not sure exactly why. So I called hicexplorer to convert to cool format and then do the calculations, but it also reported an error.

Nuturetree commented 6 months ago

python /public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.cool J668_Fov7_720min_Ghjin_D05.cool Ghjin_D05_TAD_region.bed --reso 20000

Nuturetree commented 6 months ago

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, kwds)) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 379, in comp2domins_by_twtest Diffmatnorm = normDiffbyMeanSD(D=Diffmat) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 272, in normDiffbyMeanSD b[k] = np.max(val1) File "<__array_function__ internals>", line 6, in amax File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2755, in amax keepdims=keepdims, initial=initial, where=where) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in result.append(i.get()) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value ValueError: zero-size array to reduction operation maximum which has no identity

Nuturetree commented 6 months ago

This is my input file Glr19_mock_0min_Ghjin_D05.zip Glr19_Fov7_720min_Ghjin_D05.zip Ghjin_D05_TAD.zip

Nuturetree commented 6 months ago

I'm guessing if it's due to unequal lines in the input file

Nuturetree commented 6 months ago

This is the number of hic for the same material at different times

Nuturetree commented 6 months ago

@ Ghjin_D05_abs.zip

Nuturetree commented 6 months ago

Since the two matrix matrices did not contain bin interactions equal to 0, the rows were not equal, which made it impossible to compare the two matrices after generating the cool. I used a script in hicpro to generate a symmetric N*N matrix and converted the lower triangular matrix to three columns "bin1 bin2 reads" and then used hicexplorer to convert the matrix to cool and then compared them but the fourth and fifth columns did not have any values. Script: python ~/biosoft/HiC-Pro-master/bin/utils/sparseToDense.py ${out_dir}/${c}.matrix -o ${out_dir}/${c}_Symmetries.matrix Get lower triangular interactions def extract_lower_triangle(df).

Assume the DataFrame is N x N

N = df.shape[0]
# Get the row and column indices of the lower triangle
lower_tri_indices = np.tri_indices(N)

# Extract the values from the DataFrame using these indices
row_indices = lower_tri_indices[0]
col_indices = lower_tri_indices[1]
values = df.values[lower_tri_indices] # Extract values from the DataFrame

# Create a new DataFrame to store these values with proper labelling
lower_tri_df = pd.DataFrame({ 'Row': row_indices] # Create a new DataFrame to store these values with proper labelling
    'Row': row_indices + 1, # Convert to 1-based indexing
    'Column': col_indices + 1, # Convert to 1-based indexing
    'Value': values
})  
return lower_tri_df

Convert to cool hicConvertFormat -m ${mtx_f1} --bedFileHicpro ${abs_f} --inputFormat hicpro --outputFormat cool -o ${cool_f1} --resolutions 20000 run diffdomain python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple ${cool_f1} ${cool_f2} Ghjin_D05_TAD_region.bed --reso 20000

Nuturetree commented 6 months ago

I can get results, but the fifth and sixth columns don't have any values. cool1,cool2, tad and results files in the attachment input_result.zip

Tian-Dechao commented 6 months ago

Thank you for your reply, the last issue has been resolved, which result from our err option. But there is a new problem that has arisen. When I use raw matrix as input I get an error with the following message: python ~/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py dvsd multiple J668_mock_0min_Ghjin_D05.matrix J668_Fov7_720min_Ghjin_D05.matrix Ghjin_D05_TAD_region.bed --reso 20000 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in comp2domins_by_twtest_parallel(0) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 1 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in result.append(i.get()) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value KeyError: 1

I'm guessing this could be a problem with the matrix to cool or hic conversion process, but am not sure exactly why. So I called hicexplorer to convert to cool format and then do the calculations, but it also reported an error.

Let's address the issue with this specific usage first. Thank you for providing the example data. Format in Glr19_mock_0min_Ghjin_D05.matrix and Glr19_Fov7_720min_Ghjin_D05.matrix does not meet the requirements of DiffDomain. In the three-column format input file for DiffDomain, the first two columns document the exact genomic locations (bin ID * reso) of two bins in a chromatin interaction, similar to the outputs from straw function.

For example, the first two lines in the Glr19_mock_0min_Ghjin_D05.matrix should be

20000       20000       57
20000       40000       58  

rather than

1       1       57
1       2       58

Please revise the format of the .matrix files and try this usage again. Kindly let us know if the issue is solved.

Nuturetree commented 6 months ago

Thank you very much for your answer, I followed your suggestion on bin_id*reso, but still got the error message as below: multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 66, in comp2domins_by_twtest_parallel(0) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 20000

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 338, in comp2domins_by_twtest mat0 = contact_matrix_from_hic(chrn, start, end, reso, fhic0, hicnorm) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/utils.py", line 209, in contact_matrix_from_hic k=domwin_dict[bin0] KeyError: 20000 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/site-packages/diffdomain_py3/diffdomains.py", line 76, in result.append(i.get()) File "/public/home/xhhuang/miniconda3/envs/diffdomain/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value KeyError: 20000

Nuturetree commented 6 months ago

J668_Fov7_720min_Ghjin_D05_reso.zip the input matrix J668_mock_0min_Ghjin_D05_reso.zip

Nuturetree commented 6 months ago

Really think diffdomain is a very useful tool, and thanks for your prompt reply!

Tian-Dechao commented 6 months ago

A quick response first. There is a bug when loading three-column files as the input. Current version assigns every chromatin interaction in the input file for every TAD, which is wrong. For example, assigning the interaction 20000 20000 49 to the TAD Ghjin_D05 320000 500000 will obviously reaise the KeyError: 20000, since bin 20000 does not belong to the TAD region 320000-500000. A quick fix is on the way and will be uploaded soon.

Meanwhile, the input file in .hic or .cool/.mcool format is fine, because DiffDomain leverages straw for .hic file or fetch for .cool/.mcool files to first extract the subset of chromatin interactions that are within a given TAD.

Tian-Dechao commented 6 months ago

We have fixed the bug in reading chromatin interactions with three-column sparse format. Please follow the instruction in Method1: to install the conda environment to install DiffDomain and rerun the command.

This version has been tested on Macos. Code python3 diffDomain/diffdomain-py3/diffdomains.py dvsd multiple J668_Fov7_720min_Ghjin_D05_reso.matrix J668_mock_0min_Ghjin_D05_reso.matrix Ghjin_D05_TAD_region.bed --reso 20000 --ofile test.tsv

Output Screenshot 2024-04-28 at 9 46 17 PM

Full results here test.tsv.zip

Nuturetree commented 6 months ago

Thank you for your reply, I would like to ask if the files are normalized (KR or ICE) when using three-column sparse format as input, as I found that a normalization COOL file is generated when using COOL as input, whereas there is no such file when using three-column sparse format as input!

Tian-Dechao commented 6 months ago

Thank you for your reply, I would like to ask if the files are normalized (KR or ICE) when using three-column sparse format as input, as I found that a normalization COOL file is generated when using COOL as input, whereas there is no such file when using three-column sparse format as input!

The three-column sparse format is used as is, with no normalization performed by DiffDomain. .hic format or .cool/.mcool format are highly recommended for using normalized Hi-C interactions.

Nuturetree commented 3 months ago

accroding your suggestion, I want to using the cool file as input file,but generated err: test_input.zip

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, kwds)) File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/diffdomains.py", line 59, in comp2domins_by_twtest_parallel fhic0=opts[''], fhic1=opts[''],min_nbin=int(opts['--min_nbin']),f=opts['--f']) File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/utils.py", line 385, in comp2domins_by_twtest Diffmatnorm = normDiffbyMeanSD(D=Diffmat) File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/utils.py", line 266, in normDiffbyMeanSD b[k] = np.max(val1) File "<__array_function__ internals>", line 6, in amax File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2755, in amax keepdims=keepdims, initial=initial, where=where) File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/public/home/xhhuang/biosoft/diffDomain/diffdomain-py3/diffdomains.py", line 76, in result.append(i.get()) File "/public/home/xhhuang/miniconda3/envs/diffdomain2/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value ValueError: zero-size array to reduction operation maximum which has no identity

Nuturetree commented 3 months ago

the cool file generated by using the hicConvertFormat