dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
137 stars 36 forks source link

malformed merged gVCF from DV config #303

Closed Overcraft90 closed 6 months ago

Overcraft90 commented 6 months ago

Hi there,

I'm doing some analyses with the SGDP panel and need to create a joint VCF from the Giraffe-DV pipeline. Everything up to variant calling went smoothly, and even upon connecting/contacting the Google staff they confirmed there are not many options to do so as per GATK.

Then looking up the DV Git page I found GLnexux and tried to merge the 279 gVCF from the SGDP. Apparently, there are no evident errors (see merge.log file); however, for some reason, the output VCF is malformed (see screenshot). In particular, the header and samples' names seem to be fine but the actual body is corrupted. For clarity I share the script I run:

#!/bin/bash
#
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=32
#SBATCH --time=24:00:00
#SBATCH --mem=450gb
#
#SBATCH --job-name=merge
#SBATCH --output=merge.out
#
#SBATCH --partition=<partition>
#
#SBATCH --account=<account_name>

cd /path/to/folder

singularity run -B /path/to/folder glnexus_v1.4.1.sif \
  glnexus_cli \
  --config DeepVariantWGS \
  --dir temp_merge \
  --list to_merge.list \
  --threads 32 > SGDP_panel.vcf

Screenshot Screenshot 2024-01-14 at 11 26 44 AM

Overcraft90 commented 6 months ago

UPDATE

I actually figured out the issue was that GLnexus outputs by default a BCF file. So, simply adding:

| ./bcftools view --threads 32 -O z -o SGDP_panel.vcf.gz

generated the expected VCF file.