arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 120 forks source link

Issues building database ( KeyError: u'start_retained_variant') #909

Open hoppmann opened 5 years ago

hoppmann commented 5 years ago

Hi,

occasionally I run into an issue while building the database. See error log in the end. A few times it works if i get rid of multithreading, but most time the error stays. In the end I still have a working DB, but I guess that some variants will be missing, since one chunk won't be integrated. Am I right here? Do you have any suggestions on how to get rid of the error, or what it could cause? Thanks in advance.

My command: /bin/ngs/gemini/anaconda/bin/gemini load -t all --cores 12 -v 02-annotated/merged-vep-snpeff.vcf.gz 03-databases/merged.db

CADD scores are being loaded (to skip use:--skip-cadd). GERP per bp is being loaded (to skip use:--skip-gerp-bp). Loading 57467 variants. Breaking /dsk/data1/studies/01_ngs/01_Kinderklinik/proton/2018-01-24_Proton_Run34/02-annotated/merged-vep-snpeff.vcf.gz into 12 chunks. Loading chunk 0. Loading chunk 1. Loading chunk 2. Loading chunk 3. Loading chunk 4. Loading chunk 5. Loading chunk 6. Loading chunk 7. Loading chunk 8. Loading chunk 9. Loading chunk 10. Loading chunk 11. [W::vcf_parse] contig 'chr1' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr1' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr4' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr2' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr8' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr14' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr18' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr6' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr11' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr16' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr10' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr19' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr3' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr20' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr12' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr5' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr19' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr17' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr2' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr15' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr11' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr7' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr9' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr21' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr22' is not defined in the header. (Quick workaround: index the file with tabix.) Traceback (most recent call last): File "/data/programs/bin/ngs/gemini/anaconda/bin//gemini", line 7, in gemini_main.main() File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1249, in main args.func(parser, args) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 311, in loadchunk_fn gemini_load_chunk.load(parser, args) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 914, in load gemini_loader.populate_from_vcf() File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 219, in populate_from_vcf (variant, variant_impacts) = self._prepare_variation(var, anno_keys) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 495, in _prepare_variation top_impact = geneimpacts.Effect.top_severity(impacts) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/geneimpacts/effect.py", line 305, in top_severity effects = sorted(effects) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/functools.py", line 60, in ('lt', lambda self, other: self <= other and not self == other), File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/geneimpacts/effect.py", line 270, in le if self.severity != other.severity: File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/geneimpacts/effect.py", line 355, in severity v = max(lookup[sev[csq]] for csq in self.consequences) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/geneimpacts/effect.py", line 355, in v = max(lookup[sev[csq]] for csq in self.consequences) KeyError: u'start_retained_variant' [W::vcf_parse] contig 'chr16' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr6' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr4' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr13' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr10' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chrX' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr8' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chr14' is not defined in the header. (Quick workaround: index the file with tabix.) [W::vcf_parse] contig 'chrY' is not defined in the header. (Quick workaround: index the file with tabix.) pid 5878: 4799 variants processed. pid 5851: 4788 variants processed. pid 5858: 4788 variants processed. pid 5876: 4788 variants processed. pid 5864: 4788 variants processed. pid 5845: 4788 variants processed. pid 5860: 4788 variants processed. pid 5867: 4788 variants processed. pid 5848: 4788 variants processed. pid 5870: 4788 variants processed. pid 5854: 4788 variants processed. Traceback (most recent call last): File "/data/programs/bin/ngs/gemini/anaconda/bin/gemini", line 7, in gemini_main.main() File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1249, in main args.func(parser, args) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 204, in load_fn gemini_load.load(parser, args) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 49, in load load_multicore(args) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 93, in load_multicore chunks = load_chunks_multicore(grabix_file, args) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 264, in load_chunks_multicore wait_until_finished(procs) File "/data/programs/bin/ngs/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_load.py", line 359, in wait_until_finished raise ValueError("Processing failed on GEMINI chunk load") ValueError: Processing failed on GEMINI chunk load

brentp commented 5 years ago

if you update your geneimpacts module (pip install -U geneimpacts) this should be resolved by a more recent version.

hoppmann commented 5 years ago

Hi, unfortunately this didn't help. I'm still getting the same error.