aradenbaugh / radia

RADIA: RNA and DNA Integrated Analysis for Somatic Mutation Detection
GNU Affero General Public License v3.0
29 stars 11 forks source link

Help with interpreting the output VCF #11

Closed aditisk closed 4 years ago

aditisk commented 4 years ago

Hi @aradenbaugh, I have some questions about interpreting the merged VCF that gets generated after running all the steps in the pipeline.

  1. Which tool can I use to annotate this VCF to add additional database annotations if needed ? I tried Funcotator but that didn't work for me due to some format discrepancy.

  2. I am using the triple BAM method, what would be the easiest way to get the # of mutations in the different categories shown in Fig 3A of the paper ? Currently I am parsing the VCF for the value of SST, is this the correct way to do this ?

Thank you once again for your help.

aradenbaugh commented 4 years ago
  1. I would recommend using SnpEff to annotate the RADIA VCF files. I have specifically tested versions 3.3 and 4.3.

  2. Figure 3 from the RADIA paper used separate validation data to calculate sensitivity and specificity. The Somatic Low/Med/High categories came from the validation data, as well as the "Ambiguous", "Germline", and "Not Validated" categories. If you just want to summarize your results, you can use the "SST" flag that you mentioned to get some of the different categories that were shown in Figure 3A. Here is a breakdown of the categories:

The results from the Triple BAM method are "Somatic RNA Confirmation" plus "Somatic RNA Rescue". For this kind of summary, you would probably want to focus on just the passing calls. If you're interested in RNA-Editing, you can look at lines with "SST=RADIATumEdit". You could also create the Somatic Low/Med/High categories by using the AF (Allele Frequency) field of the Tumor DNA or Tumor RNA sample.

Hope this helps.

aditisk commented 4 years ago

Thank you so much for your suggestions and the detailed explanation, I really appreciate it.