eblerjana / genotyping-pipelines

2 stars 1 forks source link

About the workflow of "Create PanGenie-ready VCF from Minigraph-Cactus VCF" #2

Open JhinAir opened 1 day ago

JhinAir commented 1 day ago

Dear Dr. Ebler,

when I ran the pipeline using the human hprc-v1.1 MC_graph VCF (hprc-v1.1-mc-chm13.vcfbub.a100k.wave.vcf.gz) and GFA (hprc-v1.1-mc-chm13.gfa), I encountered an exception during the 'annotate_vcf' step, with the error message: "assert nr_alleles > 1". I have attached both the log file, error file and config file for your reference. Do you have any insights on what might be causing this error? Thank you very much.

Best regards, Jing Liu issue.zip

eblerjana commented 18 hours ago

Hi,

This pipeline only works for VCFs representing graph bubbles as produced by Minigraph-Cactus using the --vcfoption. The VCF you are using was decomposed already using an alternative decomposition approach based on vcfwave (which is not compatible with PanGenie). If you want to use the HPRC data, follow the following steps:

  1. Get the raw Minigraph-Cactus HPRC VCF: https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-chm13/hprc-v1.1-mc-chm13.raw.vcf.gz
  2. Run vcfbub to remove non-top level bubbles: vcfbub -l 0 -r 100000 --input hprc-v1.1-mc-chm13.raw.vcf.gz | bgzip > mc.vcf.gz && tabix -p vcf mc.vcf.gz
  3. Use the resulting VCF (mc.vcf.gz) as input to this pipeline.

Note that for newer versions of Minigraph-Cactus, the vcf produced with --vcf is already processed with vcfbub. These VCF can thus be directly used with this pipeline (no need to run the command shown in step 2). VCFs produced with MC's --vcfwave option (as the one you have) are not compatible.

JhinAir commented 8 hours ago

I see. I will retry. Thank you very much for your detailed help!