ambj / MuPeXI

MuPeXI: the mutant peptide extractor and informer, a tool for predicting neo-epitopes from tumor sequencing data.
Other
46 stars 28 forks source link

Allele_Frequency column is emitted empty #28

Open ShahiRB opened 5 years ago

ShahiRB commented 5 years ago

Hi, I wonder why "Allele_Frequency" filed is empty i.e "-" in *.mupexi output. I am using vcf from GATK4-Mutect2.

Regards, Raj

ShahiRB commented 5 years ago

A notice should be taken if TUMOR sample is in col. 9 or 10 of vcf file. This script assumes that it's in column 9, but by Mutect2, it can be in col 10.

"Mutect2" should be added in line 497.

ibwoo commented 5 years ago

Hi, I wonder why "Allele_Frequency" filed is empty i.e "-" in *.mupexi output. I am using vcf from GATK4-Mutect2.

Regards,

Raj

Hi @biotechnepal, If you're using GATK4, this issue has already been resolved and @ambj has pushed the fix to the 'master-branch', but perhaps this is not yet in the release version.

A quick fix would be to check the VCF file header line #ID and change this to reflect the old MuTect2 case-sensitive naming. MuPeXI is looking for "ID=MuTect2" and GATK4 supplies "ID=Mutect2", if I'm not mistaken.

Once this is recognised correctly by MuPeXI, the allele frequencies should be correctly taken from your vcf file if everything else is in order.

I hope that's helpful.

ibwoo commented 5 years ago

@kobejamescurry this sounds like it should be a separate issue? Are you talking about the Yes/No column in MuPeXI output? If so, MuPeXI just checks each mutant against (I think) the "symbol" column of the cancer concensus file. It's basically just saying is this mutation in a gene that's associated with cancer. You could substitute that cosmic file with any other gene list. In my case, I believe this file was a csv so if you have a vcf you may have downloaded the wrong file or maybe I've misunderstood your question.

ibwoo commented 5 years ago

@kobejamescurry in that link you just sent me, I believe that if you click the 'csv' button and sign in or make an account, I think that's where you get the list in CSV format. Again, I'm not sure where you found the "cosmic vcf" but perhaps you mis-typed.

ibwoo commented 5 years ago

@kobejamescurry It's not clear what your question is, if it was not answered by my earlier response. This 'Cancer Gene Census' list on Cosmic is used by MuPeXI as part of their additional 'Annotation' information. I've explained how the MuPeXI program accesses the downloaded .csv and checks the gene symbols against that list. It then adds something like "Yes" or "TRUE" (I can't remember off the top of my head) for a positive hit in the 'Cancer Driver Gene" column of MuPeXI output.

This column is only used as extra annotation to help the end user, and so you can choose to filter that Cosmic gene list, supply your own or omit it entirely and it won't matter. If you are concerned that you want to have the exact same version of the list that the MuPeXI webserver uses, you would have to ask Anne-Mette, as Cosmic constantly updates their list (last updated in 2019) and it's unlikely that the MuPeXI webserver keeps it up to date.

ShahiRB commented 5 years ago

Hi, I wonder why "Allele_Frequency" filed is empty i.e "-" in *.mupexi output. I am using vcf from GATK4-Mutect2.

Regards,

Raj

Hi @biotechnepal, If you're using GATK4, this issue has already been resolved and @ambj has pushed the fix to the 'master-branch', but perhaps this is not yet in the release version.

A quick fix would be to check the VCF file header line #ID and change this to reflect the old MuTect2 case-sensitive naming. MuPeXI is looking for "ID=MuTect2" and GATK4 supplies "ID=Mutect2", if I'm not mistaken.

Once this is recognised correctly by MuPeXI, the allele frequencies should be correctly taken from your vcf file if everything else is in order.

I hope that's helpful.

Hi @ibwoo , Thanx!