GaoLabXDU / HiSV

HiSV: a computational pipeline for structural variation detection from Hi-C data
MIT License
14 stars 1 forks source link

Stuck while running hiccovert #4

Open pierrebarry opened 1 year ago

pierrebarry commented 1 year ago

Hi,

I am trying to run HiSV but I am stuck on the first step in converting the HiC file in bam format into matrix file. I ran the command line just below:

hiccovert --hic_file hic_faba_sorted.bam --binsize 50000 --format bam --ref ref.len --output Matrix_data --name Faba --cores 2

However, this command line is running for more than 2 weeks. The last lines of the last message from standard output that was written down was:

chromosome: scaffold_4000 scaffold_4126
chromosome: scaffold_796 scaffold_796
chromosome: scaffold_796 scaffold_420
chromosome: scaffold_796 scaffold_3995
chromosome: scaffold_796 scaffold_3076
chromosome: scaffold_796 scaffold_1951
chromosome: scaffold_796 scaffold_3745
chromosome: scaffold_796 scaffold_4126
chromosome: scaffold_420 scaffold_420
chromosome: scaffold_420 scaffold_3995
chromosome: scaffold_420 scaffold_3076
chromosome: scaffold_420 scaffold_1951
chromosome: scaffold_420 scaffold_3745
chromosome: scaffold_420 scaffold_4126
chromosome: scaffold_3995 scaffold_3995
chromosome: scaffold_3995 scaffold_3076
chromosome: scaffold_3995 scaffold_1951
chromosome: scaffold_3995 scaffold_3745
chromosome: scaffold_3995 scaffold_4126
chromosome: scaffold_3076 scaffold_3076
chromosome: scaffold_3076 scaffold_1951
chromosome: scaffold_3076 scaffold_3745
chromosome: scaffold_3076 scaffold_4126
chromosome: scaffold_1951 scaffold_1951
chromosome: scaffold_1951 scaffold_3745
chromosome: scaffold_1951 scaffold_4126
chromosome: scaffold_3745 scaffold_3745
chromosome: scaffold_3745 scaffold_4126
chromosome: scaffold_4126 scaffold_4126

The size of the contact matrix (hic_faba_sorted.bam) is 6.3G. Is the time required to run this command is normal ?

Thanks for your help,

Pierre

GaoLabXDU commented 1 year ago

Hi Pierre,

According to the output you provided, the hiccovert module is still running and this result can be stored in Matrix_data/Intra_matrix or Matrix_data/Inter_matrix.

When we convert the bam file into a matrix file, we will traverse the bam file when obtaining each pair of chromosomes. How fast it works depends on how many chromosome pairs you provide. You seem to be providing a lot of such chromosome pairs, which is the reason for the slow running results.

In addition, our code can run in parallel, --cores indicates the number of parallel threads. According to the command you provided, this parameter is set to 2. We recommend that you increase this parameter as much as possible according to your computer configuration to improve the running speed.

I hope my answer can help you. If you have any questions, please let me know.

Thanks, Li

pierrebarry commented 1 year ago

Hi,

Thanks a lot for your answser, I restricted the analysis to the ~20 longer scaffolds of the reference genome and the command ran successfully within few hours !

Thanks for your help,

Pierre