KolmogorovLab / hapdup

Pipeline to convert a haploid assembly into diploid
Other
90 stars 10 forks source link

Some questions about using HapDup #11

Closed zmz1988 closed 2 years ago

zmz1988 commented 2 years ago

Dear developers,

Thanks for the very nice tool! I have some questions about using HapDup. Could you please give me some hints?

(1) In the readme file, it mentioned that the alignment of reads are filtered for the unassembled region. Does it mean that the reads kept in the alignment file are only for the unassembled regions?

(2) In the output blocks bed file, I find two status of the blocks: MissingConcordancy and Discordancy. Do you mind to give a bit more info on the difference of these two status? Does MissingConcordancy mean discordant but without evidence to support?

(3) In the output, could we tell which hap (hap_1 or hap_2) is the version of input haplotype?

(4) Could I use HapDup to generate the heterozygous regions that are at the end of the contigs? I ask this because I have assembly generated from hifi reads with less continuity than assembly generated from Nanopore reads. So I thought we could use Nanopore contigs to connect the hifi contigs. However, the hifi contigs at some heterozygous region can not be connected, because they are from different haplotigs from the ones in the Nanopore contig. So I can't easily fill the gap between the two hifi contigs. So I thought whether I can use HapDup to generate different haploid regions at the end of the hifi contig by using Nanopore reads (only at the end of the contig), but without interfere the region in the contigs?

Thanks lot in advance!

mikolmogorov commented 2 years ago

Hi,

  1. It's the opposite: only reads from assembled regions are kept.
  2. This is output by Margin phasing procedure. Missing Concordancy means there is not enough information to connect phased blocks (e.g. not enough reads to connect), and Discordancy means there is contradictory information in connecting reads.
  3. Neither, because the input haplotype is expected to be a chimera of two real haplotypes.
  4. I would start from producing haplotypes for ONT contigs (with either ONT and HiFi reads). You can run on HiFi contigs as well, and it theoretically should work for contig ends. But those regions are always less reliable.

Hope that helps, Mikhail

zmz1988 commented 2 years ago

Thanks! Everything is clear now!