Currently, multi-sample VCF files with missing entries in the FORMAT field (e.g. since the variant is not reported for this sample) give an error with vembrane table.
So multi-sample VCF files that are the output of somatic variant calling (e.g. mutect2) are not affected, but VCF files that are concatenated from multiple variant callers are affected and produce this error.
The proposed solution is fixing the handling of missing entries within vembrane, adressed in this issue: https://github.com/vembrane/vembrane/issues/171.
-[cio-abcd/variantinterpretation] Pipeline completed with errors-
ERROR ~ Error executing process > 'CIOABCD_VARIANTINTERPRETATION:VARIANTINTERPRETATION:VEMBRANE_TABLE:VEMBRANE_VEMBRANETABLE (test_T1)'
Caused by:
Process `CIOABCD_VARIANTINTERPRETATION:VARIANTINTERPRETATION:VEMBRANE_TABLE:VEMBRANE_VEMBRANETABLE (test_T1)` terminated with an error exit status (1)
Command executed:
vembrane table \
--output test_T1.tsv \
--header 'CHROM,POS,ID,REF,ALT,QUAL,FILTER,for_each_sample(lambda sample: f"allele_fraction{sample}"),for_each_sample(lambda sample: f"read_depth{sample}]"),for_each_sample(lambda sample: f"FORMAT_GT[{sample}]"),for_each_sample(lambda sample: f"FORMAT_AD[{sample}][0]"),for_each_sample(lambda sample: f"FORMAT_AD[{sample}][1]"),CSQ_Allele,CSQ_Consequence,CSQ_IMPACT,CSQ_SYMBOL,CSQ_Gene,CSQ_Feature_type,CSQ_Feature,CSQ_BIOTYPE,CSQ_EXON,CSQ_INTRON,CSQ_HGVSc,CSQ_HGVSp,CSQ_cDNA_position,CSQ_CDS_position,CSQ_Protein_position,CSQ_Amino_acids,CSQ_Codons,CSQ_Existing_variation,CSQ_DISTANCE,CSQ_STRAND,CSQ_FLAGS,CSQ_PICK,CSQ_VARIANT_CLASS,CSQ_SYMBOL_SOURCE,CSQ_HGNC_ID,CSQ_CANONICAL,CSQ_MANE_SELECT,CSQ_MANE_PLUS_CLINICAL,CSQ_TSL,CSQ_APPRIS,CSQ_CCDS,CSQ_ENSP,CSQ_SWISSPROT,CSQ_TREMBL,CSQ_UNIPARC,CSQ_UNIPROT_ISOFORM,CSQ_REFSEQ_MATCH,CSQ_REFSEQ_OFFSET,CSQ_GIVEN_REF,CSQ_USED_REF,CSQ_BAM_EDIT,CSQ_GENE_PHENO,CSQ_SIFT,CSQ_PolyPhen,CSQ_DOMAINS,CSQ_miRNA,CSQ_HGVS_OFFSET,CSQ_AF,CSQ_AFR_AF,CSQ_AMR_AF,CSQ_EAS_AF,CSQ_EUR_AF,CSQ_SAS_AF,CSQ_gnomADe_AF,CSQ_gnomADe_AFR_AF,CSQ_gnomADe_AMR_AF,CSQ_gnomADe_ASJ_AF,CSQ_gnomADe_EAS_AF,CSQ_gnomADe_FIN_AF,CSQ_gnomADe_NFE_AF,CSQ_gnomADe_OTH_AF,CSQ_gnomADe_SAS_AF,CSQ_gnomADg_AF,CSQ_gnomADg_AFR_AF,CSQ_gnomADg_AMI_AF,CSQ_gnomADg_AMR_AF,CSQ_gnomADg_ASJ_AF,CSQ_gnomADg_EAS_AF,CSQ_gnomADg_FIN_AF,CSQ_gnomADg_MID_AF,CSQ_gnomADg_NFE_AF,CSQ_gnomADg_OTH_AF,CSQ_gnomADg_SAS_AF,CSQ_MAX_AF,CSQ_MAX_AF_POPS,CSQ_CLIN_SIG,CSQ_SOMATIC,CSQ_PHENO,CSQ_PUBMED,CSQ_VAR_SYNONYMS,CSQ_MOTIF_NAME,CSQ_MOTIF_POS,CSQ_HIGH_INF_POS,CSQ_MOTIF_SCORE_CHANGE,CSQ_TRANSCRIPTION_FACTORS' \
--annotation-key CSQ \
'CHROM,POS,ID,REF,ALT,QUAL,FILTER,for_each_sample(lambda s: FORMAT["AD"][s][1]/FORMAT["DP"][s]),for_each_sample(lambda s: FORMAT["DP"][s]),for_each_sample(lambda s: FORMAT["GT"][s]),for_each_sample(lambda s: FORMAT["AD"][s][0]),for_each_sample(lambda s: FORMAT["AD"][s][1]),CSQ["Allele"],CSQ["Consequence"],CSQ["IMPACT"],CSQ["SYMBOL"],CSQ["Gene"],CSQ["Feature_type"],CSQ["Feature"],CSQ["BIOTYPE"],CSQ["EXON"],CSQ["INTRON"],CSQ["HGVSc"],CSQ["HGVSp"],CSQ["cDNA_position"],CSQ["CDS_position"],CSQ["Protein_position"],CSQ["Amino_acids"],CSQ["Codons"],CSQ["Existing_variation"],CSQ["DISTANCE"],CSQ["STRAND"],CSQ["FLAGS"],CSQ["PICK"],CSQ["VARIANT_CLASS"],CSQ["SYMBOL_SOURCE"],CSQ["HGNC_ID"],CSQ["CANONICAL"],CSQ["MANE_SELECT"],CSQ["MANE_PLUS_CLINICAL"],CSQ["TSL"],CSQ["APPRIS"],CSQ["CCDS"],CSQ["ENSP"],CSQ["SWISSPROT"],CSQ["TREMBL"],CSQ["UNIPARC"],CSQ["UNIPROT_ISOFORM"],CSQ["REFSEQ_MATCH"],CSQ["REFSEQ_OFFSET"],CSQ["GIVEN_REF"],CSQ["USED_REF"],CSQ["BAM_EDIT"],CSQ["GENE_PHENO"],CSQ["SIFT"],CSQ["PolyPhen"],CSQ["DOMAINS"],CSQ["miRNA"],CSQ["HGVS_OFFSET"],CSQ["AF"],CSQ["AFR_AF"],CSQ["AMR_AF"],CSQ["EAS_AF"],CSQ["EUR_AF"],CSQ["SAS_AF"],CSQ["gnomADe_AF"],CSQ["gnomADe_AFR_AF"],CSQ["gnomADe_AMR_AF"],CSQ["gnomADe_ASJ_AF"],CSQ["gnomADe_EAS_AF"],CSQ["gnomADe_FIN_AF"],CSQ["gnomADe_NFE_AF"],CSQ["gnomADe_OTH_AF"],CSQ["gnomADe_SAS_AF"],CSQ["gnomADg_AF"],CSQ["gnomADg_AFR_AF"],CSQ["gnomADg_AMI_AF"],CSQ["gnomADg_AMR_AF"],CSQ["gnomADg_ASJ_AF"],CSQ["gnomADg_EAS_AF"],CSQ["gnomADg_FIN_AF"],CSQ["gnomADg_MID_AF"],CSQ["gnomADg_NFE_AF"],CSQ["gnomADg_OTH_AF"],CSQ["gnomADg_SAS_AF"],CSQ["MAX_AF"],CSQ["MAX_AF_POPS"],CSQ["CLIN_SIG"],CSQ["SOMATIC"],CSQ["PHENO"],CSQ["PUBMED"],CSQ["VAR_SYNONYMS"],CSQ["MOTIF_NAME"],CSQ["MOTIF_POS"],CSQ["HIGH_INF_POS"],CSQ["MOTIF_SCORE_CHANGE"],CSQ["TRANSCRIPTION_FACTORS"]' \
test_T1.filt.vcf
cat <<-END_VERSIONS > versions.yml
"CIOABCD_VARIANTINTERPRETATION:VARIANTINTERPRETATION:VEMBRANE_TABLE:VEMBRANE_VEMBRANETABLE":
vembrane: $(echo $(vembrane --version 2>&1) | sed 's/^.*vembrane //; s/Using.*$//' ))
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
No type information available for 'PICK', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
No type information available for 'UNIPROT_ISOFORM', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
No type information available for 'REFSEQ_MATCH', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
No type information available for 'REFSEQ_OFFSET', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
No type information available for 'BAM_EDIT', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
No type information available for 'gnomADg_AMI_AF', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
No type information available for 'gnomADg_MID_AF', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
No type information available for 'VAR_SYNONYMS', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
No type information available for 'TRANSCRIPTION_FACTORS', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
vembrane only supports records with one alternative allele.
Please split multi-allelic records first, for example with `bcftools norm -m-any […]` or `gatk LeftAlignAndTrimVariants […] --split-multi-allelics` or `vcfmulti2oneallele […]`
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor normal
chr17 7673803 . G A . . . GT:AD:AF:DP 0/1:11,75:0.864:86 ./.:.:.:.
### System information
Nextflow version 23.10.1
current `dev` version of variantinterpretation pipeline (commit 245bbe2e6df7b4e5f7f3912b7659eaa22d49a5d9)
Description of the bug
Currently, multi-sample VCF files with missing entries in the FORMAT field (e.g. since the variant is not reported for this sample) give an error with
vembrane table
. So multi-sample VCF files that are the output of somatic variant calling (e.g. mutect2) are not affected, but VCF files that are concatenated from multiple variant callers are affected and produce this error. The proposed solution is fixing the handling of missing entries within vembrane, adressed in this issue: https://github.com/vembrane/vembrane/issues/171.The erro was already mentioned in PR #44.
Command used and terminal output
Error:
{ "input": "config/samplesheet.csv", "outdir": "results/", "vep_cache_version": "110", "vep_cache_source": "refseq", "transcriptfilter": "PICK", "fasta": "Homo_sapiens_assembly38.fasta", "population_db": "CSQ_MAX_AF", "calculate_tmb": false, }
sample,vcf test,testsample.vcf.gz
fileformat=VCFv4.2
FORMAT=
FORMAT=
FORMAT=
FORMAT=
contig=
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor normal
chr17 7673803 . G A . . . GT:AD:AF:DP 0/1:11,75:0.864:86 ./.:.:.:.