YingruWuGit / HiCKey

0 stars 0 forks source link

HiCKey

This is HiCKey for detecting TAD boundaries and their hierarchical structure in HiC data. For the details of our methodology, please refer to Deciphering hierarchical organization of topologically associated domains through change-point testing. BMC Bioinformatics (2021) (https://rdcu.be/clEFG). One of the advantages of HiCKey is that it outputs p-values of the detected boundaries.

R package

In another repository "HiCKeyR" (https://github.com/YingruWuGit/HiCKeyR) we have the R package of HiCKey. The R package was constructed by Rcpp with the same source code. There is one more function in the R package that returns a sub-matrix of segmented HiC data, so user can draw a heatmap to show the results. The R package works in Windows and MacOS.

Examples

There are four sample HiC datasets in the folder "examples".

Normalization

HiCKey requires normalized HiC data, that the effect of power law decay removed. Usually the observed over expected precedure is preferred, as many HiC datasets are released in that form.

Otherwise, we suggest a simple normalization method. Let be raw reads, apply to do the transform. the parameter can be estimated from reads close to the main diagonal of the HiC matrix where most of the power law effect pronounce. Specifically, collect non-zero in the region and use simple linear regression , in which treated as unknown error, to estimate .

Arguments setting

"BrownianP.txt" is the simulated distribution of our test statistic, which is needed in each analysis.

User needs to specify 6 arguments in the file "arguments_HiCKey.txt" (for Windows) or "arguments_hickey" (for Linux).

For example, if HiC data is "nijchr16.txt", the "arguments_HiCKey.txt" can be:

C:/Users/Andrew/Documents/GitHub/HiCKey/examples/nijchr16.txt
C:/Users/Andrew/Documents/GitHub/HiCKey/BrownianP.txt
m
5
0.05
0.00005

If HiC data is "nijchr16_list.txt", the "arguments_HiCKey.txt" can be:

C:/Users/Andrew/Documents/GitHub/HiCKey/examples/nijchr16_list.txt
C:/Users/Andrew/Documents/GitHub/HiCKey/BrownianP.txt
1
5
0.05
0.00005

Note: the resolution should be 1 if the list form HiC data was derived form matrix form.

If HiC data is "chr21_50kb.RAWobserved", the "arguments_HiCKey.txt" can be:

C:/Users/Andrew/Documents/GitHub/HiCKey/examples/chr21_50kb.RAWobserved
C:/Users/Andrew/Documents/GitHub/HiCKey/BrownianP.txt
50000
5
0.05
0.00005

Usage

Download HiCKey.exe (hickey, or compile the source code), arguments_HiCKey.txt (arguments_hickey), BrownianP.txt and prepare your HiC data file.

Modify the arguments in arguments_HiCKey.txt (arguments_hickey).

For Windows user, open Command Prompt, change directory to the folder containing HiCKey.exe and arguments_HiCKey.txt, then input:

HiCKey arguments_HiCKey.txt

For Linux user, open Terminal, change directory to the folder containing hickey and arguments_hickey, then input:

./hickey ./arguments_hickey

If it shows bash: ./hickey: Permission denied, try chmod u+x ./hickey first and then execute the program.

Output

For HiC data file with name "xxxx", the output file would be named as "xxxx_output.txt" in the same directory. The output file has three columns:

HiCKey also generates a BED file named as "xxxx_TADs.bed" in the same directory. It also has three columns:

The rows in the BED file are arranged in this way. Suppose that the highest order of all boundaries is 3, HiCKey records all TADs with order <= 3 in the first batch, then all TADs with order <= 2 in the second batch, and all TADs with order <=1 in the last batch.

References

Rao SSP, Huntley MH, Durand N, Stamenova EK, Bochkov I, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2015;159(7):1665–80.

Forcato M, Nicoletti C, Pal K, Livi C, Ferrari F, Bicciato S. Comparison of computational methods for Hi-C data analysis. Nat Methods. 2017;14:679–85.