broadinstitute / oncotator

Other
67 stars 33 forks source link

Oncotator input VCF REF/ALT alleles do not match output alleles #340

Closed crutching closed 8 years ago

crutching commented 8 years ago

We ran across this issue working with a bare bones input VCF (CHROM, POS, ALT, and REF only, the rest '.'). For instance:

1 162740326 . GT C . . .

results in:

1 162740326 . CT C .

Similar behavior for C/TCT which becomes C/CCT. All of the examples of this behavior I have found to this point are indels.

dwking2000 commented 8 years ago

the version of Oncotator we are running shows up in the output file as:

oncotator_version=Oncotatorv1.8.0.0|_Flat_File_Referencehg19|_GENCODE_v19CANONICAL|_UniProt_AAxform_201412|_COSMIC_v62291112|_dbNSFPv2.4|_1000gp320130502|_dbSNP_build142|_ESP6500SI-V2|_ESP6500SI-V2|_ClinVar12.03.20|_UniProt_AA_201412|_CCLE_By_GP09292010|_ORegAnno_UCSCTrack|_Ensembl_ICGCMUCOPA|_TCGAScape110405|_HGNCSept172014|_MutSig_Published_Results20110905|_Familial_Cancer_Genes20110905|_CCLE_By_Gene09292010|_gencode_xref_refseq_metadatav19|_HumanDNARepairGenes20110905|_COSMIC_FusionGenes_v62291112|_UniProt_201412|_COSMIC_Tissue291112|_TUMORScape20100104|_CGC_full2012-03-15|_ACHILLES_Lineage_Results110303

LeeTL1220 commented 8 years ago

Is this on the website or the standalone?

On Fri, Feb 12, 2016 at 5:26 PM, Doug King notifications@github.com wrote:

the version of Oncotator we are running shows up in the output file as:

oncotator_version=Oncotatorv1.8.0.0|_Flat_File_Referencehg19|

_GENCODE_v19CANONICAL|_UniProt_AAxform_201412|_COSMIC_v62291112| _dbNSFPv2.4|_1000gp320130502|_dbSNP_build142|_ESP6500SI-V2| _ESP6500SI-V2|_ClinVar12.03.20|_UniProt_AA_201412| _CCLE_By_GP09292010|_ORegAnno_UCSCTrack|_Ensembl_ICGCMUCOPA| _TCGAScape110405|_HGNCSept172014|_MutSig_Published_Results20110905| _Familial_Cancer_Genes20110905|_CCLE_By_Gene09292010| _gencode_xref_refseq_metadatav19|_HumanDNARepairGenes20110905| _COSMIC_FusionGenes_v62291112|_UniProt_201412|_COSMIC_Tissue291112| _TUMORScape20100104|_CGC_full2012-03-15| _ACHILLES_Lineage_Results110303

— Reply to this email directly or view it on GitHub https://github.com/broadinstitute/oncotator/issues/340#issuecomment-183510923 .

Lee Lichtenstein Broad Institute 75 Ames Street, Room 7003EB Cambridge, MA 02142 617 714 8632

dwking2000 commented 8 years ago

Standalone

On Friday, February 12, 2016, Lee Lichtenstein notifications@github.com wrote:

Is this on the website or the standalone?

On Fri, Feb 12, 2016 at 5:26 PM, Doug King <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

the version of Oncotator we are running shows up in the output file as:

oncotator_version=Oncotatorv1.8.0.0|_Flat_File_Referencehg19|

_GENCODE_v19CANONICAL|_UniProt_AAxform_201412|_COSMIC_v62291112| _dbNSFPv2.4|_1000gp320130502|_dbSNP_build142|_ESP6500SI-V2| _ESP6500SI-V2|_ClinVar12.03.20|_UniProt_AA_201412| _CCLE_By_GP09292010|_ORegAnno_UCSCTrack|_Ensembl_ICGCMUCOPA| _TCGAScape110405|_HGNCSept172014|_MutSig_Published_Results20110905| _Familial_Cancer_Genes20110905|_CCLE_By_Gene09292010| _gencode_xref_refseq_metadatav19|_HumanDNARepairGenes20110905| _COSMIC_FusionGenes_v62291112|_UniProt_201412|_COSMIC_Tissue291112| _TUMORScape20100104|_CGC_full2012-03-15| _ACHILLES_Lineage_Results110303

— Reply to this email directly or view it on GitHub < https://github.com/broadinstitute/oncotator/issues/340#issuecomment-183510923

.

Lee Lichtenstein Broad Institute 75 Ames Street, Room 7003EB Cambridge, MA 02142 617 714 8632

— Reply to this email directly or view it on GitHub https://github.com/broadinstitute/oncotator/issues/340#issuecomment-183548815 .

LeeTL1220 commented 8 years ago

what you've shown there is not a valid VCF. The ref and alt must start with the reference base before the actual mutation. Oncotator is putting the reference base as a prepend, which is correct. On Feb 13, 2016 12:06 AM, "Doug King" notifications@github.com wrote:

Standalone

On Friday, February 12, 2016, Lee Lichtenstein notifications@github.com wrote:

Is this on the website or the standalone?

On Fri, Feb 12, 2016 at 5:26 PM, Doug King <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

the version of Oncotator we are running shows up in the output file as:

oncotator_version=Oncotatorv1.8.0.0|_Flat_File_Referencehg19|

_GENCODE_v19CANONICAL|_UniProt_AAxform_201412|_COSMIC_v62291112| _dbNSFPv2.4|_1000gp320130502|_dbSNP_build142|_ESP6500SI-V2| _ESP6500SI-V2|_ClinVar12.03.20|_UniProt_AA_201412| _CCLE_By_GP09292010|_ORegAnno_UCSCTrack|_Ensembl_ICGCMUCOPA|

_TCGAScape110405|_HGNCSept172014|_MutSig_Published_Results20110905| _Familial_Cancer_Genes20110905|_CCLE_By_Gene09292010| _gencode_xref_refseq_metadatav19|_HumanDNARepairGenes20110905|

_COSMIC_FusionGenes_v62291112|_UniProt_201412|_COSMIC_Tissue291112| _TUMORScape20100104|_CGC_full2012-03-15| _ACHILLES_Lineage_Results110303

— Reply to this email directly or view it on GitHub <

https://github.com/broadinstitute/oncotator/issues/340#issuecomment-183510923

.

Lee Lichtenstein Broad Institute 75 Ames Street, Room 7003EB Cambridge, MA 02142 617 714 8632

— Reply to this email directly or view it on GitHub < https://github.com/broadinstitute/oncotator/issues/340#issuecomment-183548815

.

— Reply to this email directly or view it on GitHub https://github.com/broadinstitute/oncotator/issues/340#issuecomment-183589091 .

crutching commented 8 years ago

@LeeTL1220 Ah, yes, I overlooked that. I will have to bring this back to the developers of this particular variant caller.

I would say that changing GT/C to CT/C does not completely make sense. If you look at the reference, the base immediately preceding GT is G (AGGGTGT). So, you would write this as GGT/GC, certainly not CT/C. I understand that Oncotator is assuming the input VCF follows the spec, but I would think some quick validation should occur before modifying these fields and potentially grabbing incorrect annotations.

dwking2000 commented 8 years ago

You can close this issue, we have determined the input VCF is not valid. It would be good to open another issue that addresses Oncotator allowing invalid data to be processed.