broadinstitute / oncotator

Other
67 stars 32 forks source link

Oncotator issue with MuTect VCF #122

Open LeeTL1220 opened 10 years ago

LeeTL1220 commented 10 years ago

Please contact Lee if you need the VCF files that were attached to this email Note that the db-dir was was used incorrectly

I fed a vcf created by mutect to oncotator, with only ref_hg in the datasource directory, and oncotator does not like it at all.

My command line was:

oncotator /opt/data/testbam/SRR305173.mutect.vcf SRR.out hg19 -i VCF -o VCF --log_name SRR.log --db-dir /opt/data/OncotatorData/oncotator_v1_ds/ref_hg

I can use other vcfs instead of the mutect one and they go through oncotator properly, even a vcf aligned to a b37 reference, so I’m quite puzzled as to what is the problem here. I’ve attached the input vcf, the output created by oncotator and the log file.

This is the traceback:

Traceback (most recent call last):

File "/usr/local/bin/oncotator", line 9, in

load_entry_point('Oncotator==v1.0.1.0', 'console_scripts', 'oncotator')()

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/Oncotator.py", line 228, in main

annotator.annotate()

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/Annotator.py", line 235, in annotate

filename = self._outputRenderer.renderMutations(mutations, metadata=metadata, comments=comments)

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/output/VcfOutputRenderer.py", line 116, in renderMutations

dataManager = OutputDataManager(self.configTable, mutations, comments, metadata, path)

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/output/OutputDataManager.py", line 41, in init

self.mutation, self.mutations = self._fetchFirstMutation(muts)

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/output/OutputDataManager.py", line 56, in _fetchFirstMutation

for mutation in muts:

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/Annotator.py", line 244, in _applyManualAnnotations

for m in mutations:

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/Annotator.py", line 252, in _applyDefaultAnnotations

for m in mutations:

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/Annotator.py", line 288, in _annotate_mutations_using_datasources

for m in mutations:

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/input/VcfInputMutationCreator.py", line 290, in createMutations

sampleMut = self._addGenotypeData2Mutation(sampleMut, record, index)

File "/usr/local/lib/python2.7/site-packages/Oncotator-v1.0.1.0-py2.7.egg/oncotator/input/VcfInputMutationCreator.py", line 124, in _addGenotypeData2Mutation

val = genotypeData[ID][index]

TypeError: 'float' object has no attribute 'getitem'

johnbegemann commented 10 years ago

Lee - Sorry about the bad usage of --db-dir. I will update Nicky.

Also I am able to suppress these failures with the following: (Oncotator_1.0.1.0)[jbegemann@jbc6 24April2014]$ more vcf.in.local.config [NOT_SPLIT_TAGS] FORMAT: FA,BQ

LeeTL1220 commented 10 years ago

Is this still an issue?

LeeTL1220 commented 10 years ago

Talked to Geraldine and she shed some light on this issue. @elephanthunter talk to me before proceeding. This is a little more complicated than originally thought, but we have designed a simple solution.

elephanthunter commented 10 years ago

@LeeTL1220 This very issue is a bug. A summary of what we talked about: ensure that #CHROM field in the output VCF are mapped according to encoded contig information. In other cases, VCF should be followed strictly.

vdauwera commented 10 years ago

Is this the issue we're using to discuss the VCF contig names handling? If so, the good news is I checked the chain file for hg19 to b37, and there is actually no difference between them apart from the contig names -- the contig length issue only applies relative to older builds (hg18, b36 etc). My turn to be embarrassed since I was being so dogmatic about the whole thing, apparently for nothing (which leaves me wondering why we don't do on-the-fly renaming inside GATK, instead of bothering with parallel bundles). So you don't need to worry about lifting over the datasources in any way. We can just focus on the solution of respecting user input for now, and later add the sequence dictionary enforcement if we want to give users the option to conform to a specific reference convention (which may need to special-case chrM -> MT).

LeeTL1220 commented 10 years ago

Doh... I owe Alex Ramos 2 bucks.

chrM --> MT may still be an issue. I will have to remind myself what happens in Oncotator.

Many thanks for your help!

On Tue, May 20, 2014 at 2:37 PM, Geraldine Van der Auwera < notifications@github.com> wrote:

Is this the issue we're using to discuss the VCF contig names handling? If so, the good news is I checked the chain file for hg19 to b37, and there is actually no difference between them apart from the contig names -- the contig length issue only applies relative to older builds (hg18, b36 etc). My turn to be embarrassed since I was being so dogmatic about the whole thing, apparently for nothing (which leaves me wondering why we don't do on-the-fly renaming inside GATK, instead of bothering with parallel bundles). So you don't need to worry about lifting over the datasources in any way. We can just focus on the solution of respecting user input for now, and later add the sequence dictionary enforcement if we want to give users the option to conform to a specific reference convention (which may need to special-case chrM -> MT).

— Reply to this email directly or view it on GitHubhttps://github.com/broadinstitute/oncotator/issues/122#issuecomment-43666041 .

Lee Lichtenstein Broad Institute of MIT and Harvard 7 Cambridge Center, Room 4041 Cambridge, MA 02142 617 714 8632

LeeTL1220 commented 10 years ago

@vdauwera @elephanthunter Can we close this issue ?

LeeTL1220 commented 10 years ago

@vdauwera @elephanthunter Or at least remove the Public Release tag?

elephanthunter commented 10 years ago

Removing public release tag.