bluenote-1577 / flopp

flopp is a software package for single individual haplotype phasing of polyploid organisms from long read sequencing.
33 stars 7 forks source link

haplotagged bams ? #6

Open colindaven opened 2 years ago

colindaven commented 2 years ago

Hi,

thanks for developing this tool, which I hope will provide useful in phasing triploid bams from the ONT tool megalodon, so I can analyze methylation in these triploids in a future project.

I may have missed it, and you do optionally output various haplotyped bams with a post proc script, but wouldn't it be better or at least an alternative to output a haplotagged bam with the phasing information ? This has been requested here for example for longphase.

https://github.com/twolinin/longphase/issues/6

Haplotagged bams seems to be useful for downstream programs, eg https://github.com/rrazaghi/modbamtools which I am intending to use. Perhaps the separated BAMs would be equivalent (though more numerous).

Thanks, Colin

bluenote-1577 commented 2 years ago

Hi Colin,

Thanks for your interest in flopp. You're definitely correct that it would've been better from the start to just output haplotagged bams instead of splitting the bam file.

I've added a new script called ''haplotag_bam.py'' in the scripts/ folder which does exactly the same thing as the post proc script for splitting the bam file except it creates a single new bam with HP:i:x tags. I've updated the README in the Output section to describe how it works.

It seems to work okay from a quick test so far, although pysam does take a while to write bam files. It would be nice if there was an in-place way to modify the BAM file so it wouldn't take so long.

Let me know if the script does what you need it to.

Thanks,

Jim

colindaven commented 2 years ago

Thanks Jim for the quick reply and script, that's excellent. I'll test it for sure when the data comes through in a couple of months (don't have relevant datasets quite yet) and get back to you then.

cheers, Colin