ambj / MuPeXI

MuPeXI: the mutant peptide extractor and informer, a tool for predicting neo-epitopes from tumor sequencing data.
Other
46 stars 28 forks source link

Skipping VCF compatibility subroutine #16

Closed acesnik closed 5 years ago

acesnik commented 6 years ago

Hi there,

I am working with a VCF file produced by HaplotypeCaller in the gatk 4.0 suite. This type of file gets hung up in MuPeXI during the check of whether it's from MuTect or not, and making a "vep_compatible" VCF only outputs the header. However, this VCF does work fine directly in VEP using the --format vcf option,

To fix this and test out MuPeXI, I simply bypassed create_vep_compatible_vcf() in the setup and passed --format vcf into VEP. See here for the diff.

I'm wondering if 1) this seems like a good idea, and 2) if you would consider adding a flag to allow using the input VCF in VEP directly.

Thanks!

AC

ambj commented 6 years ago

Hi AC

Is the qustion related to the following issue: #14 ? Can you provide a test VCF file and the error message MuPeXI outputs? I have tested MuPeXI with a VCF file origination from GATK 4.0 Muetct2 with no errors occurring (se #12 )

The reason for the VEP create_vep_compatible_vcf() funktion is due to the versions of VEP used when MuPeXI was developed only taking vcf files with a certain chromosome annotation. If you provide a snippet of a VCF that giving you this problem i can test and se how to avoid this in the future and if the create_vep_compatible_vcf() funktion have been made obsolete in later versions of VEP.

Best, /AM

acesnik commented 6 years ago

Sure! Here's a dropbox link to the VCF file.

Here's the error message: image 1

It sounds like you're converting UCSC "chr#" chromosome annotations to Ensembl "#" annotations... Hope you can find the problem!

AC

acesnik commented 6 years ago

I don't think this is related to #14. That issue has to do with VEP cache versions, whereas this has to do with VCF formatting.

ambj commented 6 years ago

hmmm i cannot reproduce you error, but i do see a different one. I dont have VEP version 92 installed so this is tested with 87

tuba[ambj]:/home/tuba/ambj/Projects/MuPeXI/data/testdata/20180503_github#16> /home/tuba/ambj/Projects/MuPeXI/GitHub/MuPeXI/MuPeXI.py -c /home/tuba/ambj/Projects/MuPeXI/bin/MuPeXI/config.ini -v SRR1025675.vcf

Reading in data Creating proteome reference dictionary Creating genome reference dictionary Creating cancer genes list

VEP: Starting process for running the Ensembl Variant Effect Predictor Detecting variant caller Variant caller not detected in VCF file. NOTE: Genomic allele frequency is only taken into account with variant calls from MuTect or MuTect2! Change VCF to the VEP compatible Extracting allele frequencies Running VEP ERROR: VEP output file empty VEP Can't use an undefined value as a symbol reference at

I think there are two things in this: 1) The VEP version 2) Your VCF file is from HaplotypeCaller and is therefor not somatic mutations. I would recommend using MuTect2 for you variant calls, as it does not make sense to extract neopeptides from SNPs. Then you har only looking at differences from the individual to the rest og the population and not the tumor specific mutations

acesnik commented 6 years ago

Okay, that makes sense. Thanks for the quick response!