Haploid Haplotype Reconstruction

What is your question? @eblerjana I am working on reconstructing a haploid haplotype using the imputed genotypes from PanGenie. Currently, I am using the following commands:

PanGenie -i Reads.fq -r MHC-CHM13.ref.fa -v MHC_49-MC.vcf -o temp/APD_PG -t32 && bgzip temp/APD_PG_genotyping.vcf
tabix -p vcf temp/APD_PG_genotyping.vcf.gz && rm -rf APD_rec_PG.fasta
bcftools view -e 'GT="het"' temp/APD_PG_genotyping.vcf.gz | bgzip > temp/APD_PG_genotyping_no_homo.vcf.gz && tabix -p vcf temp/APD_PG_genotyping_no_homo.vcf.gz
bcftools consensus -f MHC-CHM13.ref.fa -o Rec_PG.fasta temp/APD_PG_genotyping_no_homo.vcf.gz

In the above commands, I am using haploid reads to obtain genotypes, then filtering the heterozygous variants, and finally using the filtered genotypes to reconstruct the haploid haplotype from the imputed filtered genotypes.

My question is: Is this the correct way to use PanGenie to reconstruct haplotypes? The input VCF is a phased diploid VCF generated by the minigraph-cactus pipeline and preprocessed with the "prepare-mc-vcf" pipeline.

If applicable: which version of PanGenie are you using? v3.1.0

If applicable: how did you run PanGenie? Please provide the command lines used. Did you run it using Singularity? I've used conda to install PanGenie

If applicable: what data are you running PanGenie on? Which species are you analyzing? Which input reads are used? How does the input VCF look like (number of input samples, how was it produced etc.)? MHC VCF file generated using Minigraph-Cactus pipeline and preprocessed using "prepare-mc-vcf" pipeline.

eblerjana / pangenie

Haploid Haplotype Reconstruction #88