WGLab / LinkedSV

MIT License
21 stars 8 forks source link

possorted_bam.bam versus phased_possorted_bam.bam #28

Open burrir opened 2 years ago

burrir commented 2 years ago

Hi, I would be curious to hear whether you expect the possorted_bam.bam output by "LongRanger align" (rather than the phased_possorted_bam.bam output by "LongRanter wgs") should work adequately in linkedSV?

We are looking to replace LongRanger wgs, as it is tremendously time consuming, and to our understanding, LongRanger align provides a phased output as well. We yet fail to understand the difference between the bam files output by the two different pipelines.

Any insights would be very appreciated!

Best wishes, Reto

fangli80 commented 2 years ago

Hello Reto, Yes. I understand that the Longranger pipeline is pretty slow. I have not tested on the "possorted_bam.bam". Does the bam file has a "BX" tag for barcode and an "HP" tag for haplotype?

Thanks, Li

burrir commented 2 years ago

Hi Li,

thanks a lot for getting back on this question! The merely possorted bam files contain the BX field. This is what e.g. hapCUT2 uses to infer phase information (which is why I figured that there is phase info, but it's the info to infer phase and not phase info itself). However, as I now saw when checking, these files do not feature the HP tag. Thus the difference in naming and why using the possorted_bam will not work for linkedSV.

Thanks again & best wishes! Reto