BGI-Qingdao / HicTrioBinning

10 stars 0 forks source link

Can I change Hi-C data? #1

Open naturalstay opened 1 year ago

naturalstay commented 1 year ago

Hi, thank you for the powerful script. Now, I have the haplotype-resolved contig-level assemblies, but I don't have the child Hi-C. I have only the Hi-C data of this sample. Can I use this script to get haplotype-resolved Hi-C data. Look forward to your reply.

adonis316 commented 1 year ago

Hi, Thank you for using this script. There are two ways to generate haplotype-resolved Hi-C data.

  1. If you have no haplotype-resolved assemblies, then please use haplotype-resolved k-mers to bin the Hi-C data.
  2. If you already have haplotype-resolved assemblies, then you can use this script directly to generate haplotype-resolved Hi-C data. You have to provide two haplotype-resolved contig-level assemblies, and one Hi-C dataset for the same sample. It should work for you.

Thanks, Mengyang

naturalstay commented 1 year ago

Thank you for your reply. Now I understand the Hi-C data used here, but I have another question. The line 107 and line 109 for the HTB.sh, awk '{if($3==$4) printf("%s\t%s\n",$1,$5+$6);}', it seems that only the reads pairs are aligned to the same contig are retained, others were filtered. I think this is unreasonable for Hi-C data. What do you think? Thanks, Look forward to your reply.

adonis316 commented 1 year ago

Thank you for the suggestion. This is a strong constraint on filtering multiple alignments due to the short read length for Hi-C reads. It worked great if the input haplotype-resolved contig assemblies are with good contiguity. But it might reduce links to build scaffolds among contigs. We will remove this constraint in the next version.

Thanks, Mengyang

naturalstay commented 1 year ago

Thank you for your reply.