dyxstat / HiCBin

HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps
GNU Affero General Public License v3.0
13 stars 4 forks source link

error on 'Normalizing raw contacts by HiCzin' #2

Closed Taojianchang closed 3 years ago

Taojianchang commented 3 years ago

Dear sir,

When I using HiCzin, I got an error on the step 'Normalizing raw contacts by HiCzin'. The log file and current output files can be found here. Thanks!

valid_contact.csv.txt

hicbin.log

contig_info.csv.txt

dyxstat commented 3 years ago

Hi Jianchang,

Thanks for reporting this problem. This problem comes from your coverage file since the coverages of 1,502 contigs are zero. Therefore, in the normalization, the biases factor \log(coverage_i * coverage_j) becomes negative infinity. For more details about the normalization, please refer to our RECOMB HiCzin paper.

I have updated the hicbin.py and Hiczin.R to solve this problem. Please just replace the original two scripts with the updated version.

Moreover, in the command line, '--thres' denotes the acceptable fraction of incorrectly identified valid contacts, which ranges from 0 to 1. We suggest not selecting a large value as this will discard a lot of 'good' Hi-C contacts. From our experiments, 5% could already help us discard most spurious contacts, which is also our default value.

Best, Yuxuan

Taojianchang commented 3 years ago

Hi Yuxuan,

Thanks for solving the problem and the suggestion on '--thres' setting. I tried again, however, there was an other error occurred. The error log is: "Traceback (most recent call last): File "./hicbin.py", line 185, in ifelse(args.min_binsize, runtime_defaults['min_binsize'])) File "/home/jianchang/software/HiCBin/Cluster.py", line 55, in init self.norm()
File "/home/jianchang/software/HiCBin/Cluster.py", line 89, in norm c = (log(c1*c2)-self.norm_result[9])/self.norm_result[10] ValueError: math domain error".

Best, Jianchang

dyxstat commented 3 years ago

Thanks, Jianchang. This problem is again because the abundances of some of your contigs are zero.

Please update the script 'Cluster.py' to solve this problem.

Best, Yuxuan

Taojianchang commented 3 years ago

Thank you. I got the resulting bins now.