PacificBiosciences / HiPhase

Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
Other
69 stars 4 forks source link

How to Generate Haplotype 1 and 2 FASTA Files Using Phased VCF from HiPhase #51

Open ClarenceHsiang opened 4 days ago

ClarenceHsiang commented 4 days ago

Dear HiPhase : I attempted to use bcftools to generate haplotype sequences, but the process was unsuccessful. Are there alternative solutions available? Below is the code I used: bcftools consensus -f reference_genome.fasta -H 1 output_phased_variants.vcf.gz > haplotype1.fasta bcftools consensus -f reference_genome.fasta -H 2 output_phased_variants.vcf.gz > haplotype2.fasta

Sincerely,

Clarence

holtjma commented 2 days ago

Hi @ClarenceHsiang,

Can you elaborate on what in particular was unsuccessful? Was it an error in HiPhase or bcftools?

Also, according to the docs (https://samtools.github.io/bcftools/bcftools.html#consensus), the -H # option does not account for phasing. It seems like you might want the NpIu version that is described.

I'm unaware of alternatives, but I have not tried to create these sequences before.

Matt

ClarenceHsiang commented 1 day ago

Thank you very much for your response. The relevant error messages are as follows:

[Clarence]$ /data/share/tools/bcftools-1.9/bcftools consensus -f 221021hap1_assembly.fa -H 1 hap1.out_phased.vcf.gz > hap1_phased_haplotype1.fasta The site Chr1_1:4195225 overlaps with another variant, skipping... The site Chr1_1:4201421 overlaps with another variant, skipping... The site Chr1_1:4202106 overlaps with another variant, skipping... The site Chr1_1:28824762 overlaps with another variant, skipping... The site Chr1_1:30388601 overlaps with another variant, skipping... The fasta sequence does not match the REF allele at Chr2_1:28577524: .vcf: [A] .vcf: [ATATTATTTTACTATGCAAATG] <- (ALT) .fa: [G]TATTATTTGCAAACGTATAGCCGGGCGGCTATACTGTTTAAATGCGGCGGTACTATAACACTTTATAAACACACCCAAAACCCATCATTACACACACTCAAAACTCATTTATAAACACACCCAAAAAGTTGACTTTAGGGTTAATTTTGGGTGTGTTTCTATGTTTAAATAGTGCTTGGGTTTTTTTATAAATACATGTCTAGGATTGGGTTTATTTATAAATTTGTATTAAATCTATTCTTTTTAAACGTACAACCGCGCGGCTATAATATTTACAACGATGATAAAGGGTTTCTTTTTGTCATCATCCTCAGCCATGAGTTCTGTCATAGACACTCTATTTACTTAAGTTTACAATGTAAATAAGGAAGTATCCCCTATTTGGATTCAAATAAAAATGCTAGAAAATCAAATTCCCACCAAATCAAAATATAAAAAATCGGTTTTAGTGCAAAATACGATTTAGGCTAAAAATGATGGCTCATTAAGTCAATATATACTTTATACTAAAATTCCATAAAATCAAGTTTTCTTATTTGATGTGTAAAGAATAAAAGCCGCCAAAAATTATCTTGAGAAAATGATTATTTTGACAAACCATTTTTCATACTTAGTCAATATTTTTATATAAAATATTTTTTTCATGTATGATATGAATGCATGAAGGGACCTAAGGTTTTTTTTAGTTTGTGAAATTTTGATTTTTTTTCTTTCTATTGGGTTTAATAGGGGAAAAGGCTTTAAAGATATTTGCTAAATCGCTACTCTTTGTCTATCTCTTACAATTTTTCTTGCTTAAAATGTATTTGTCTATCTATTTGTCGAGGACAATTATGATTGCAACCAATAAGCAATAAATTCTTGACATATGTGATATATATATATATATACTATTTTTCGTATGTATTGTGAGTTCGAACTTTTGATTCTTTATAATTTATAAGAGGCCCCCACACACATGAAATCATGACTTCGCCATTGGTTCTGTTAATGGGGTTTTGTAAGATTCAGTTTA

holtjma commented 1 day ago

Hi @ClarenceHsiang,

It's starting to look like the issues you're encountering are not problems with HiPhase running or output, but instead either a usage or run-time error with bcftools consensus. We don't have the ability to patch bcftools, so I'd recommend opening an issue there to see if they can help resolve the issue you're experiencing: https://github.com/samtools/bcftools/issues

If you can trace it to an error from HiPhase and provide details here, then I'm happy to help!

Matt