Pangenome-based genome inference
Issue with VCF (does not represent a pangenome graph) #82

Closed ChiaraF32 closed 1 month ago

ChiaraF32 commented 1 month ago

Hi @eblerjana,

I am running pangenie, using the HPRC-CHM13 (88 haplotypes) dataset VCF files, as you've specified here.

However, I am getting an error message suggesting that the VCF is not suitable:

Determine allele sequences ...
Read reference genome ...
Found 25 chromosome(s) from the reference file.
Read input VCF ...
terminate called after throwing an instance of 'std::runtime_error'
  what():  GraphBuilder: variant at chr1:11714 overlaps previous one. VCF does not represent a pangenome graph.

I have copied my runscript below for context. I am running version 3.0.1 of pangenie

awk -F, 'NR>1 {print $1, $2, $3}' $input_file | while read -r sample fq1 fq2; do
    echo "Processing sample: $sample"

    # Run PanGenie
    singularity exec -B /scratch/pawsey0933/cfolland/pangenie ${image_name} PanGenie \
        -i <(zcat ${fq1} ${fq2}) \
        -r ${ref} \
        -v ${graph_vcf} \
        -t 23 \
        -j 23 \
        -o ${outdir}/${sample}

    # Decompose bubbles
    cat ${outdir}/${sample}_genotyping.vcf | python3 ${decomp_script} ${graph_vcf} \
        > ${outdir}/${sample}_genotyping_biallelic.vcf

    # Collect vcf stats
    bcftools stats ${outdir}/${sample}_genotyping_biallelic.vcf > ${outdir}/${sample}_genotyping_biallelic.stats

eblerjana commented 1 month ago


please use the VCF provided in the column "PanGenie Input VCF". You seem to use the "Callset VCF".