bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

gemini db load error #781

Closed dwaggott closed 9 years ago

dwaggott commented 9 years ago

One of my joint calling runs (n=334 exomes) is failing the gemini load step. This is a different dataset than was reported in #773. The first lines that look suspicious in the log are below. The final directory gets added but the db is only 137M. The log does end in a memory warning so I'll have to try and track that down. Currently, I'm at 8G for the controller and 24G for the submission script.

slurmstepd: Exceeded step memory limit at some point. Step may have been partially swapped out to disk

[2015-03-06T22:56Z] sh-5-21.local: Multi-allelic to single allele
[2015-03-06T22:56Z] sh-5-21.local: decompose v0.5
[2015-03-06T22:56Z] sh-5-21.local: 
[2015-03-06T22:56Z] sh-5-21.local: options:     input VCF file        /scratch/PI/euan/projects/aric/bcbio/project_aric_euro-fb-joint/work/gemini/batch2-freebayes-joint-multiallelic.vcf.gz
[2015-03-06T22:56Z] sh-5-21.local:          [o] output VCF file       -
[2015-03-06T22:56Z] sh-5-21.local: 
[2015-03-06T22:57Z] sh-5-21.local: 
[2015-03-06T22:57Z] sh-5-21.local: stats: no. variants                 : 161146
[2015-03-06T22:57Z] sh-5-21.local:        no. biallelic variants       : 29732
[2015-03-06T22:57Z] sh-5-21.local:        no. multiallelic variants    : 131414
[2015-03-06T22:57Z] sh-5-21.local: 
[2015-03-06T22:57Z] sh-5-21.local:        no. additional biallelics    : 198207
[2015-03-06T22:57Z] sh-5-21.local:        total no. of biallelics      : 359353
[2015-03-06T22:57Z] sh-5-21.local: 
[2015-03-06T22:57Z] sh-5-21.local: Time elapsed: 28.19s
[2015-03-06T22:57Z] sh-5-21.local: 
[2015-03-06T22:57Z] sh-5-21.local: tabix index batch2-freebayes-joint-multiallelic-decompose.vcf.gz
[2015-03-06T22:57Z] sh-5-21.local: snpEff effects : SRR858538
[2015-03-06T23:07Z] sh-5-21.local: tabix index batch2-freebayes-joint-multiallelic-decompose-effects.vcf.gz
[2015-03-06T23:13Z] sh-5-21.local: bgzip batch2-freebayes-joint-nomultiallelic.vcf
[2015-03-06T23:22Z] sh-5-21.local: tabix index batch2-freebayes-joint-nomultiallelic.vcf.gz
[2015-03-06T23:23Z] sh-5-21.local: Create gemini database for /scratch/PI/euan/projects/aric/bcbio/project_aric_euro-fb-joint/work/gemini/batch2-freebayes-joint-nomultiallelic.vcf.gz : SRR858538
[2015-03-06T23:23Z] sh-5-21.local: CADD scores are being loaded (to skip use:--skip-cadd).
[2015-03-06T23:25Z] sh-5-21.local: Traceback (most recent call last):
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-06T23:25Z] sh-5-21.local:     gemini.gemini_main.main()
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1105, in main
[2015-03-06T23:25Z] sh-5-21.local:     args.func(parser, args)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 226, in loadchunk_fn
[2015-03-06T23:25Z] sh-5-21.local:     gemini_load_chunk.load(parser, args)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 681, in load
[2015-03-06T23:25Z] sh-5-21.local:     gemini_loader.populate_from_vcf()
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 100, in populate_from_vcf
[2015-03-06T23:25Z] sh-5-21.local:     (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 311, in _prepare_variation
[2015-03-06T23:25Z] sh-5-21.local:     (cadd_raw, cadd_scaled) = annotations.get_cadd_scores(var)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/annotations.py", line 428, in get_cadd_scores
[2015-03-06T23:25Z] sh-5-21.local:     len(var.ALT[0]) == 1:
[2015-03-06T23:25Z] sh-5-21.local: TypeError: object of type 'NoneType' has no len()
[2015-03-06T23:25Z] sh-5-21.local: Traceback (most recent call last):
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-06T23:25Z] sh-5-21.local:     gemini.gemini_main.main()
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1105, in main
[2015-03-06T23:25Z] sh-5-21.local:     args.func(parser, args)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 226, in loadchunk_fn
[2015-03-06T23:25Z] sh-5-21.local:     gemini_load_chunk.load(parser, args)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 681, in load
[2015-03-06T23:25Z] sh-5-21.local:     gemini_loader.populate_from_vcf()
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 100, in populate_from_vcf
[2015-03-06T23:25Z] sh-5-21.local:     (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 311, in _prepare_variation
[2015-03-06T23:25Z] sh-5-21.local:     (cadd_raw, cadd_scaled) = annotations.get_cadd_scores(var)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/annotations.py", line 428, in get_cadd_scores
[2015-03-06T23:25Z] sh-5-21.local:     len(var.ALT[0]) == 1:
[2015-03-06T23:25Z] sh-5-21.local: TypeError: object of type 'NoneType' has no len()
[2015-03-06T23:25Z] sh-5-21.local: Traceback (most recent call last):
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-06T23:25Z] sh-5-21.local:     gemini.gemini_main.main()
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1105, in main
[2015-03-06T23:25Z] sh-5-21.local:     args.func(parser, args)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 226, in loadchunk_fn
[2015-03-06T23:25Z] sh-5-21.local:     gemini_load_chunk.load(parser, args)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 681, in load
[2015-03-06T23:25Z] sh-5-21.local:     gemini_loader.populate_from_vcf()
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 100, in populate_from_vcf
[2015-03-06T23:25Z] sh-5-21.local:     (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 311, in _prepare_variation
[2015-03-06T23:25Z] sh-5-21.local:     (cadd_raw, cadd_scaled) = annotations.get_cadd_scores(var)
[2015-03-06T23:25Z] sh-5-21.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/annotations.py", line 428, in get_cadd_scores
[2015-03-06T23:25Z] sh-5-21.local:     len(var.ALT[0]) == 1:
[2015-03-06T23:25Z] sh-5-21.local: TypeError: object of type 'NoneType' has no len()
dwaggott commented 9 years ago

I checked for sites with missing ALTs and there are quite a few i.e. when the coverage is below calling threshold. Should these be filtered?

bcftools view -M1 batch2-freebayes-joint-nomultiallelic.vcf.gz | less

chapmanb commented 9 years ago

Daryl; I'd pushed a fix to the GEMINI development version earlier which should hopefully resolve this issue. Do you have the latest development version of gemini on that machine? You should be able to update with:

gemini update --devel

Sorry abotu the issues and hope this will fix it.

dwaggott commented 9 years ago

Probably not, this run was stuck in the queue for a bit. I'll retry.

On Sat, Mar 7, 2015 at 2:54 AM, Brad Chapman notifications@github.com wrote:

Daryl; I'd pushed a fix to the GEMINI development version earlier which should hopefully resolve this issue. Do you have the latest development version of gemini on that machine? You should be able to update with:

gemini update --devel

Sorry abotu the issues and hope this will fix it.

— Reply to this email directly or view it on GitHub https://github.com/chapmanb/bcbio-nextgen/issues/781#issuecomment-77684032 .

dwaggott commented 9 years ago

Updating messages (it worked on the second attempt).

Requirement already satisfied (use --upgrade to upgrade): setuptools in ./bcbio/anaconda/lib/python2.7/site-packages/setuptools-13.0.2-py2.7.egg (from pydot->python-graph-dot>=1.8.2->-r https://raw.githubusercontent.com/arq5x/gemini/master/requirements.txt (line 9))
Installing latest GEMINI development version
Collecting git+https://github.com/arq5x/gemini.git
  Cloning https://github.com/arq5x/gemini.git to /tmp/pip-bkVrMd-build
    /share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/setuptools-13.0.2-py2.7.egg/setuptools/dist.py:282: UserWarning: Normalizing '0.11.1a' to '0.11.1a0'
Installing collected packages: gemini
  Found existing installation: gemini 0.11.0
    Uninstalling gemini-0.11.0:
      Successfully uninstalled gemini-0.11.0
  Running setup.py install for gemini
    /share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/setuptools-13.0.2-py2.7.egg/setuptools/dist.py:282: UserWarning: Normalizing '0.11.1a' to '0.11.1a0'
    changing mode of build/scripts-2.7/gemini from 644 to 755
    changing mode of /share/PI/euan/apps/bcbio/anaconda/bin/gemini to 755
Successfully installed gemini-0.11.1a0
Gemini upgraded to latest version
Traceback (most recent call last):
  File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
    gemini.gemini_main.main()
  File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1105, in main
    default=None,
  File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 996, in update_fn

  File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_update.py", line 60, in release
    cbl = get_cloudbiolinux(cbl_repo)
  File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_update.py", line 72, in _get_install_script
    if not os.path.exists(test_dir) or os.path.isdir(test_dir):
IOError: zipimport: can not open file /share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/setuptools-12.3-py2.7.egg
chapmanb commented 9 years ago

Daryl; Sorry about the error. It looks like you got the code updated so should be good to go. Alternatively, if you re-run it should finish cleanly. Sometimes pip removes libraries as part of the upgrade which can cause these issues. We haven't been able to isolate all of the cases so get these intermittently. Hope a re-run does it.

dwaggott commented 9 years ago

Still looks to have failed loading. What would help for debugging?

[2015-03-07T21:17Z] sh-5-31.local: Create gemini database for /scratch/PI/euan/projects/aric/bcbio/project_aric_euro-fb-joint/work/gemini/batch2-freebayes-joint-nomultiallelic.vcf.gz : SRR858538
[2015-03-07T21:17Z] sh-5-31.local: CADD scores are being loaded (to skip use:--skip-cadd).
[2015-03-07T21:18Z] sh-5-31.local: Traceback (most recent call last):
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-07T21:18Z] sh-5-31.local:     gemini.gemini_main.main()
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1139, in main
[2015-03-07T21:18Z] sh-5-31.local:     args.func(parser, args)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 225, in loadchunk_fn
[2015-03-07T21:18Z] sh-5-31.local:     gemini_load_chunk.load(parser, args)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 685, in load
[2015-03-07T21:18Z] sh-5-31.local:     gemini_loader.populate_from_vcf()
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 105, in populate_from_vcf
[2015-03-07T21:18Z] sh-5-31.local:     (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 457, in _prepare_variation
[2015-03-07T21:18Z] sh-5-31.local:     vcf_id, self.v_id, anno_id, var.REF, ','.join(var.ALT),
[2015-03-07T21:18Z] sh-5-31.local: TypeError: sequence item 0: expected string, NoneType found
[2015-03-07T21:18Z] sh-5-31.local: Traceback (most recent call last):
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-07T21:18Z] sh-5-31.local:     gemini.gemini_main.main()
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1139, in main
[2015-03-07T21:18Z] sh-5-31.local:     args.func(parser, args)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 225, in loadchunk_fn
[2015-03-07T21:18Z] sh-5-31.local:     gemini_load_chunk.load(parser, args)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 685, in load
[2015-03-07T21:18Z] sh-5-31.local:     gemini_loader.populate_from_vcf()
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 105, in populate_from_vcf
[2015-03-07T21:18Z] sh-5-31.local:     (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 457, in _prepare_variation
[2015-03-07T21:18Z] sh-5-31.local:     vcf_id, self.v_id, anno_id, var.REF, ','.join(var.ALT),
[2015-03-07T21:18Z] sh-5-31.local: TypeError: sequence item 0: expected string, NoneType found
[2015-03-07T21:18Z] sh-5-31.local: Traceback (most recent call last):
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-07T21:18Z] sh-5-31.local:     gemini.gemini_main.main()
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1139, in main
[2015-03-07T21:18Z] sh-5-31.local:     args.func(parser, args)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 225, in loadchunk_fn
[2015-03-07T21:18Z] sh-5-31.local:     gemini_load_chunk.load(parser, args)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 685, in load
[2015-03-07T21:18Z] sh-5-31.local:     gemini_loader.populate_from_vcf()
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 105, in populate_from_vcf
[2015-03-07T21:18Z] sh-5-31.local:     (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 457, in _prepare_variation
[2015-03-07T21:18Z] sh-5-31.local:     vcf_id, self.v_id, anno_id, var.REF, ','.join(var.ALT),
[2015-03-07T21:18Z] sh-5-31.local: TypeError: sequence item 0: expected string, NoneType found
[2015-03-07T21:18Z] sh-5-31.local: Traceback (most recent call last):
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-07T21:18Z] sh-5-31.local:     gemini.gemini_main.main()
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1139, in main
[2015-03-07T21:18Z] sh-5-31.local:     args.func(parser, args)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 225, in loadchunk_fn
[2015-03-07T21:18Z] sh-5-31.local:     gemini_load_chunk.load(parser, args)
[2015-03-07T21:18Z] sh-5-31.local:   File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 685, in load
chapmanb commented 9 years ago

Daryl; Sorry about that failure. I managed to replicate and pushed a fix to GEMINI, so if you update to the development version one last time:

gemini update --devel

it should hopefully finish cleanly now. Thanks for all the patience and please let us know if you run into anything else.