bluenote-1577 / flopp

flopp is a software package for single individual haplotype phasing of polyploid organisms from long read sequencing.
33 stars 7 forks source link

How to prepare the input vcf file for polyloid genome? #14

Open xxllgg opened 11 months ago

xxllgg commented 11 months ago

Hi there, Thanks for your great software! I'm using flopp to get the phased triploid and tetraploid plant genomes. There are two questions I want to ask:

  1. Should I use the contig assembly(the output of hifiasm: hap1.p_ctg, hap2.p_ctg) or the chromosome assembly(with a lot of collapsed region)?
  2. How do I prepare the phased vcf file like the pds.vcf in test_vcfs folder? I tried longshot and freebayes, but couldn't get the same result. Could you give me some advice? Thank you! Xiaolong
bluenote-1577 commented 10 months ago

Hi @xxllgg,

Thanks for using flopp.

  1. I'm not very famililar with hifiasm's outputs, but my intuition is that the chromosome assembly with collapsed regions may be better, because the haplotypes output by hifiasm may already be haplotype level, and may not need further phasing that flopp provides. I would try both, and see how many SNPs are in the contig assembly to get a feeling of how collapsd hifiasm's outputs are.
  2. The pds.vcf file was generated synthetically, so I didn't use other software for doing this. flopp does not require a phased vcf as an input because it outputs a phasing, so it doesn't use phasing information.

Thanks,

Jim

xxllgg commented 10 months ago

Hi Jim,

Thanks for your help. I will try it as you suggest. Another situation is that our triploid genome assembly had been partly phased using Hi-C. Then I calculated the reads coverage and found that the reads depth in some region is much higher than other regions. I think these are the collapsed regions. chr3 So, should I using flopp to resolve the high-depth region or the whole chromosome? And from what I understand, the phased reads results of flopp might looks like: Chr3.1-0, Chr3.1-1, Chr3.1-2, Chr3.2-0, Chr3.2-1, Chr3.2-2... How can I use the reads information to get the correct phasing results? Can I use the reads from Chr3.1-0 + Chr3.2-0 + Chr3.3-0 to get one phased chromosome?

Thanks, Xiaolong

bluenote-1577 commented 10 months ago

Hi Xiaolong,

Thanks for the very informative plot.

Flopp will only work on collapsed and high-depth regions, so I suggest you only work with these collapsed regions (and the BAMs within these regions). Flopp's haplotypes will not be accurate if you run it on a non-collapsed region, since the non-collapsed region does not need to be phased anyways.

Flopp will not recognize which regions are collapsed and not-collapsed automatically, unfortunately, so you may have to do some processing for this. This is because flopp was designed primarily for phasing against an entire chromosome, rather than an assembly, but you can still use flopp for an assembly as long as you only phase high-depth regions.

It looks like single haplotype depth is ~25x coverage, so the collapsed regions are ~50x coverage, meaning two haplotypes get collapsed together? Correct me if I am wrong.

What I would do is for each collapsed region, look at its coverage. If it has ~50x coverage, assume it has ploidy 2 (-p 2) and use flopp to phase just this section. Importantly, the collapsed region looks like ploidy 2, so even though the whole organism has ploidy 3, this section has ploidy 2.

Regarding what reads to use, I would use all reads that map to the collapsed regions.

Thanks,

Jim

zhangyixing3 commented 6 months ago

Hi Jim,

Thanks for your help. I will try it as you suggest. Another situation is that our triploid genome assembly had been partly phased using Hi-C. Then I calculated the reads coverage and found that the reads depth in some region is much higher than other regions. I think these are the collapsed regions. chr3 So, should I using flopp to resolve the high-depth region or the whole chromosome? And from what I understand, the phased reads results of flopp might looks like: Chr3.1-0, Chr3.1-1, Chr3.1-2, Chr3.2-0, Chr3.2-1, Chr3.2-2... How can I use the reads information to get the correct phasing results? Can I use the reads from Chr3.1-0 + Chr3.2-0 + Chr3.3-0 to get one phased chromosome?

Thanks, Xiaolong

hi, I encountered the same problem. The collapsed regions may be very homogeneous and difficult to phase. Have you succeeded? Can you give me some advice ?