cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium
https://cumc.github.io/xqtl-protocol/
MIT License
41 stars 43 forks source link

New error in vcfQC #576

Closed hsun3163 closed 1 year ago

hsun3163 commented 1 year ago

When rerunning vcfQC using the protocol data, qc_1

following error occurs:

hs3163@node101:/mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing$ singularity exec  containers/bioinfo.sif /bin/bash /home/hs3163/.sos/b9a19f5ebdf9ee71/singularity_run_25319.sh > tmp
Error: ambiguous filtering expression, both INFO/DP and FORMAT/DP are defined in the VCF header.

This error occurs due to the following codes:

# split multiallelic sites into biallelic records
bcftools norm -m-any  input_data/Genotype/DEJ_11898_B01_GRM_WGS_2017-05-15_21.recalibrated_variants.xqtl_protocol_data.add_chr.add_chr.vcf.gz |\
# when incorrect or missing REF allele is encountered: warn (w) and set/fix (s) bad sites, no left normalization is done.
bcftools norm -d exact -N --check-ref ws -f reference_data/GRCh38_full_analysis_set_plus_decoy_hla.noALT_noHLA_noDecoy_ERCC.fasta  --threads 1 | \
bcftools +fill-tags -- -t all,F_MISSING,'VD=sum(DP)'

The DP in question is:

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">

So the vcfQC should take the DP in the format field

hsun3163 commented 1 year ago

This error occurs at bcftools 1.16. The one tested before is ##bcftools_viewVersion=1.14+htslib-1.14. Should convert the bioinfo.sif back to 1.14

hsun3163 commented 1 year ago

As it turns out, after switching the version of bcftools theproblem persist. this error is not due to version.

hsun3163 commented 1 year ago

Fix implement to make the code more robust.