cio-abcd / variantinterpretation

Collaborative Interpretation-Pipeline workflow based on nf-core pipeline structure
MIT License
7 stars 1 forks source link

Missing entries in multi-sample VCF file cause vembrane error #46

Open sci-kai opened 2 weeks ago

sci-kai commented 2 weeks ago

Description of the bug

Currently, multi-sample VCF files with missing entries in the FORMAT field (e.g. since the variant is not reported for this sample) give an error with vembrane table. So multi-sample VCF files that are the output of somatic variant calling (e.g. mutect2) are not affected, but VCF files that are concatenated from multiple variant callers are affected and produce this error. The proposed solution is fixing the handling of missing entries within vembrane, adressed in this issue: https://github.com/vembrane/vembrane/issues/171.

The erro was already mentioned in PR #44.

Command used and terminal output

nextflow command:

nextflow run cio-abcd/variantinterpretation \
      -params-file config/minimalconf.json \
      -profile singularity \
      --vep_cache db/vep \
      -resume

Error:

-[cio-abcd/variantinterpretation] Pipeline completed with errors-
ERROR ~ Error executing process > 'CIOABCD_VARIANTINTERPRETATION:VARIANTINTERPRETATION:VEMBRANE_TABLE:VEMBRANE_VEMBRANETABLE (test_T1)'

Caused by:
  Process `CIOABCD_VARIANTINTERPRETATION:VARIANTINTERPRETATION:VEMBRANE_TABLE:VEMBRANE_VEMBRANETABLE (test_T1)` terminated with an error exit status (1)

Command executed:

  vembrane table \
      --output test_T1.tsv \
      --header 'CHROM,POS,ID,REF,ALT,QUAL,FILTER,for_each_sample(lambda sample: f"allele_fraction{sample}"),for_each_sample(lambda sample: f"read_depth{sample}]"),for_each_sample(lambda sample: f"FORMAT_GT[{sample}]"),for_each_sample(lambda sample: f"FORMAT_AD[{sample}][0]"),for_each_sample(lambda sample: f"FORMAT_AD[{sample}][1]"),CSQ_Allele,CSQ_Consequence,CSQ_IMPACT,CSQ_SYMBOL,CSQ_Gene,CSQ_Feature_type,CSQ_Feature,CSQ_BIOTYPE,CSQ_EXON,CSQ_INTRON,CSQ_HGVSc,CSQ_HGVSp,CSQ_cDNA_position,CSQ_CDS_position,CSQ_Protein_position,CSQ_Amino_acids,CSQ_Codons,CSQ_Existing_variation,CSQ_DISTANCE,CSQ_STRAND,CSQ_FLAGS,CSQ_PICK,CSQ_VARIANT_CLASS,CSQ_SYMBOL_SOURCE,CSQ_HGNC_ID,CSQ_CANONICAL,CSQ_MANE_SELECT,CSQ_MANE_PLUS_CLINICAL,CSQ_TSL,CSQ_APPRIS,CSQ_CCDS,CSQ_ENSP,CSQ_SWISSPROT,CSQ_TREMBL,CSQ_UNIPARC,CSQ_UNIPROT_ISOFORM,CSQ_REFSEQ_MATCH,CSQ_REFSEQ_OFFSET,CSQ_GIVEN_REF,CSQ_USED_REF,CSQ_BAM_EDIT,CSQ_GENE_PHENO,CSQ_SIFT,CSQ_PolyPhen,CSQ_DOMAINS,CSQ_miRNA,CSQ_HGVS_OFFSET,CSQ_AF,CSQ_AFR_AF,CSQ_AMR_AF,CSQ_EAS_AF,CSQ_EUR_AF,CSQ_SAS_AF,CSQ_gnomADe_AF,CSQ_gnomADe_AFR_AF,CSQ_gnomADe_AMR_AF,CSQ_gnomADe_ASJ_AF,CSQ_gnomADe_EAS_AF,CSQ_gnomADe_FIN_AF,CSQ_gnomADe_NFE_AF,CSQ_gnomADe_OTH_AF,CSQ_gnomADe_SAS_AF,CSQ_gnomADg_AF,CSQ_gnomADg_AFR_AF,CSQ_gnomADg_AMI_AF,CSQ_gnomADg_AMR_AF,CSQ_gnomADg_ASJ_AF,CSQ_gnomADg_EAS_AF,CSQ_gnomADg_FIN_AF,CSQ_gnomADg_MID_AF,CSQ_gnomADg_NFE_AF,CSQ_gnomADg_OTH_AF,CSQ_gnomADg_SAS_AF,CSQ_MAX_AF,CSQ_MAX_AF_POPS,CSQ_CLIN_SIG,CSQ_SOMATIC,CSQ_PHENO,CSQ_PUBMED,CSQ_VAR_SYNONYMS,CSQ_MOTIF_NAME,CSQ_MOTIF_POS,CSQ_HIGH_INF_POS,CSQ_MOTIF_SCORE_CHANGE,CSQ_TRANSCRIPTION_FACTORS' \
      --annotation-key CSQ \
      'CHROM,POS,ID,REF,ALT,QUAL,FILTER,for_each_sample(lambda s: FORMAT["AD"][s][1]/FORMAT["DP"][s]),for_each_sample(lambda s: FORMAT["DP"][s]),for_each_sample(lambda s: FORMAT["GT"][s]),for_each_sample(lambda s: FORMAT["AD"][s][0]),for_each_sample(lambda s: FORMAT["AD"][s][1]),CSQ["Allele"],CSQ["Consequence"],CSQ["IMPACT"],CSQ["SYMBOL"],CSQ["Gene"],CSQ["Feature_type"],CSQ["Feature"],CSQ["BIOTYPE"],CSQ["EXON"],CSQ["INTRON"],CSQ["HGVSc"],CSQ["HGVSp"],CSQ["cDNA_position"],CSQ["CDS_position"],CSQ["Protein_position"],CSQ["Amino_acids"],CSQ["Codons"],CSQ["Existing_variation"],CSQ["DISTANCE"],CSQ["STRAND"],CSQ["FLAGS"],CSQ["PICK"],CSQ["VARIANT_CLASS"],CSQ["SYMBOL_SOURCE"],CSQ["HGNC_ID"],CSQ["CANONICAL"],CSQ["MANE_SELECT"],CSQ["MANE_PLUS_CLINICAL"],CSQ["TSL"],CSQ["APPRIS"],CSQ["CCDS"],CSQ["ENSP"],CSQ["SWISSPROT"],CSQ["TREMBL"],CSQ["UNIPARC"],CSQ["UNIPROT_ISOFORM"],CSQ["REFSEQ_MATCH"],CSQ["REFSEQ_OFFSET"],CSQ["GIVEN_REF"],CSQ["USED_REF"],CSQ["BAM_EDIT"],CSQ["GENE_PHENO"],CSQ["SIFT"],CSQ["PolyPhen"],CSQ["DOMAINS"],CSQ["miRNA"],CSQ["HGVS_OFFSET"],CSQ["AF"],CSQ["AFR_AF"],CSQ["AMR_AF"],CSQ["EAS_AF"],CSQ["EUR_AF"],CSQ["SAS_AF"],CSQ["gnomADe_AF"],CSQ["gnomADe_AFR_AF"],CSQ["gnomADe_AMR_AF"],CSQ["gnomADe_ASJ_AF"],CSQ["gnomADe_EAS_AF"],CSQ["gnomADe_FIN_AF"],CSQ["gnomADe_NFE_AF"],CSQ["gnomADe_OTH_AF"],CSQ["gnomADe_SAS_AF"],CSQ["gnomADg_AF"],CSQ["gnomADg_AFR_AF"],CSQ["gnomADg_AMI_AF"],CSQ["gnomADg_AMR_AF"],CSQ["gnomADg_ASJ_AF"],CSQ["gnomADg_EAS_AF"],CSQ["gnomADg_FIN_AF"],CSQ["gnomADg_MID_AF"],CSQ["gnomADg_NFE_AF"],CSQ["gnomADg_OTH_AF"],CSQ["gnomADg_SAS_AF"],CSQ["MAX_AF"],CSQ["MAX_AF_POPS"],CSQ["CLIN_SIG"],CSQ["SOMATIC"],CSQ["PHENO"],CSQ["PUBMED"],CSQ["VAR_SYNONYMS"],CSQ["MOTIF_NAME"],CSQ["MOTIF_POS"],CSQ["HIGH_INF_POS"],CSQ["MOTIF_SCORE_CHANGE"],CSQ["TRANSCRIPTION_FACTORS"]' \
      test_T1.filt.vcf

  cat <<-END_VERSIONS > versions.yml
  "CIOABCD_VARIANTINTERPRETATION:VARIANTINTERPRETATION:VEMBRANE_TABLE:VEMBRANE_VEMBRANETABLE":
      vembrane: $(echo $(vembrane --version 2>&1) | sed 's/^.*vembrane //; s/Using.*$//' ))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  No type information available for 'PICK', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  No type information available for 'UNIPROT_ISOFORM', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  No type information available for 'REFSEQ_MATCH', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  No type information available for 'REFSEQ_OFFSET', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  No type information available for 'BAM_EDIT', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  No type information available for 'gnomADg_AMI_AF', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  No type information available for 'gnomADg_MID_AF', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  No type information available for 'VAR_SYNONYMS', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  No type information available for 'TRANSCRIPTION_FACTORS', defaulting to `str`. If you would like to have a custom type for this, please consider filing an issue at https://github.com/vembrane/vembrane/issues
  vembrane only supports records with one alternative allele.
  Please split multi-allelic records first, for example with `bcftools norm -m-any […]` or `gatk LeftAlignAndTrimVariants […] --split-multi-allelics` or `vcfmulti2oneallele […]`

### Relevant files

minimalconf.json:

{ "input": "config/samplesheet.csv", "outdir": "results/", "vep_cache_version": "110", "vep_cache_source": "refseq", "transcriptfilter": "PICK", "fasta": "Homo_sapiens_assembly38.fasta", "population_db": "CSQ_MAX_AF", "calculate_tmb": false, }


samplesheet.csv:

sample,vcf test,testsample.vcf.gz


testsample.vcf:

fileformat=VCFv4.2

FORMAT=

FORMAT=

FORMAT=

FORMAT=

contig=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor normal

chr17 7673803 . G A . . . GT:AD:AF:DP 0/1:11,75:0.864:86 ./.:.:.:.



### System information

Nextflow version 23.10.1
current `dev` version of variantinterpretation pipeline (commit 245bbe2e6df7b4e5f7f3912b7659eaa22d49a5d9)