bluenote-1577 / flopp

flopp is a software package for single individual haplotype phasing of polyploid organisms from long read sequencing.
35 stars 7 forks source link

Post-processings steps #9

Open danessel opened 1 year ago

danessel commented 1 year ago

hello, thanks for developing the tool. I'm using it also on ONT reads of potato and also see discrepancies. e.g reads (~20) having the simplex snps are seperated by flopp into 3 haplotypes and all the others are in the fourth haplotype having a DP=85. As you stated in the manuscript" We believe that clever post-processing of flopp's output by looking at coverage can recover correct haplotypes in many instances of collapsing. We envision implementing such a post-processing step in future versions of flopp" Is there any development is this direction ?

Will pre-selection of SNPs help in this respect, e.g selecting only the simplex variants ? Of course I also find regions in which Flopp is performing well ;-)

bluenote-1577 commented 1 year ago

Hi @danessel,

Thank you for using flopp!

Is there any development is this direction ?

I currently have not done more development in this direction, unfortunately. If I get more user feedback requests I'll strongly consider it.

As for this problem, I have a few suggestions.

If the other 3 haplotypes are exactly the same for a long stretch (longer than any read lenght), it may be impossible to separate them algorithmically, but maybe tweaking parameters can help.

  1. Consider setting -s to a higher number, say 50 or 100. I haven't played around with this too much, but in theory this forces the partitioning to be more uniform. It may make the output worse though.
  2. Perhaps removing the ~20 reads from single haplotype in the BAM/SAM file and then re-haplotyping with the ploidy set to 3 will force the 3 collapsed haplotypes to separate. See the README for how to access output read information in the haplotypes.
  3. Consider setting the block length -B to be 0.5 or higher. This may make use of longer read information to possibly disambiguate the collapsed haplotypes.
  4. I agree with the pre-selection of SNPs idea, although the execution is bit tricky. In the case where you have 3 haplotypes collapsing, duplex SNPs may be the most informative, because they allow you to separate things within the 3 haplotypes. If you have simplex SNPs which come from the collapsed haplotype, then I agree that you should definitely keep those SNPs, and maybe throw out SNPs that come from the single haplotype with ~20 reads.

Happy to know that flopp is working well in some regions. Let me know of any other comments you have!

Jim