Closed dwaggott closed 9 years ago
I checked for sites with missing ALTs and there are quite a few i.e. when the coverage is below calling threshold. Should these be filtered?
bcftools view -M1 batch2-freebayes-joint-nomultiallelic.vcf.gz | less
Daryl; I'd pushed a fix to the GEMINI development version earlier which should hopefully resolve this issue. Do you have the latest development version of gemini on that machine? You should be able to update with:
gemini update --devel
Sorry abotu the issues and hope this will fix it.
Probably not, this run was stuck in the queue for a bit. I'll retry.
On Sat, Mar 7, 2015 at 2:54 AM, Brad Chapman notifications@github.com wrote:
Daryl; I'd pushed a fix to the GEMINI development version earlier which should hopefully resolve this issue. Do you have the latest development version of gemini on that machine? You should be able to update with:
gemini update --devel
Sorry abotu the issues and hope this will fix it.
— Reply to this email directly or view it on GitHub https://github.com/chapmanb/bcbio-nextgen/issues/781#issuecomment-77684032 .
Updating messages (it worked on the second attempt).
Requirement already satisfied (use --upgrade to upgrade): setuptools in ./bcbio/anaconda/lib/python2.7/site-packages/setuptools-13.0.2-py2.7.egg (from pydot->python-graph-dot>=1.8.2->-r https://raw.githubusercontent.com/arq5x/gemini/master/requirements.txt (line 9))
Installing latest GEMINI development version
Collecting git+https://github.com/arq5x/gemini.git
Cloning https://github.com/arq5x/gemini.git to /tmp/pip-bkVrMd-build
/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/setuptools-13.0.2-py2.7.egg/setuptools/dist.py:282: UserWarning: Normalizing '0.11.1a' to '0.11.1a0'
Installing collected packages: gemini
Found existing installation: gemini 0.11.0
Uninstalling gemini-0.11.0:
Successfully uninstalled gemini-0.11.0
Running setup.py install for gemini
/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/setuptools-13.0.2-py2.7.egg/setuptools/dist.py:282: UserWarning: Normalizing '0.11.1a' to '0.11.1a0'
changing mode of build/scripts-2.7/gemini from 644 to 755
changing mode of /share/PI/euan/apps/bcbio/anaconda/bin/gemini to 755
Successfully installed gemini-0.11.1a0
Gemini upgraded to latest version
Traceback (most recent call last):
File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
gemini.gemini_main.main()
File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1105, in main
default=None,
File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 996, in update_fn
File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_update.py", line 60, in release
cbl = get_cloudbiolinux(cbl_repo)
File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_update.py", line 72, in _get_install_script
if not os.path.exists(test_dir) or os.path.isdir(test_dir):
IOError: zipimport: can not open file /share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/setuptools-12.3-py2.7.egg
Daryl; Sorry about the error. It looks like you got the code updated so should be good to go. Alternatively, if you re-run it should finish cleanly. Sometimes pip removes libraries as part of the upgrade which can cause these issues. We haven't been able to isolate all of the cases so get these intermittently. Hope a re-run does it.
Still looks to have failed loading. What would help for debugging?
[2015-03-07T21:17Z] sh-5-31.local: Create gemini database for /scratch/PI/euan/projects/aric/bcbio/project_aric_euro-fb-joint/work/gemini/batch2-freebayes-joint-nomultiallelic.vcf.gz : SRR858538
[2015-03-07T21:17Z] sh-5-31.local: CADD scores are being loaded (to skip use:--skip-cadd).
[2015-03-07T21:18Z] sh-5-31.local: Traceback (most recent call last):
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-07T21:18Z] sh-5-31.local: gemini.gemini_main.main()
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1139, in main
[2015-03-07T21:18Z] sh-5-31.local: args.func(parser, args)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 225, in loadchunk_fn
[2015-03-07T21:18Z] sh-5-31.local: gemini_load_chunk.load(parser, args)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 685, in load
[2015-03-07T21:18Z] sh-5-31.local: gemini_loader.populate_from_vcf()
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 105, in populate_from_vcf
[2015-03-07T21:18Z] sh-5-31.local: (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 457, in _prepare_variation
[2015-03-07T21:18Z] sh-5-31.local: vcf_id, self.v_id, anno_id, var.REF, ','.join(var.ALT),
[2015-03-07T21:18Z] sh-5-31.local: TypeError: sequence item 0: expected string, NoneType found
[2015-03-07T21:18Z] sh-5-31.local: Traceback (most recent call last):
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-07T21:18Z] sh-5-31.local: gemini.gemini_main.main()
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1139, in main
[2015-03-07T21:18Z] sh-5-31.local: args.func(parser, args)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 225, in loadchunk_fn
[2015-03-07T21:18Z] sh-5-31.local: gemini_load_chunk.load(parser, args)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 685, in load
[2015-03-07T21:18Z] sh-5-31.local: gemini_loader.populate_from_vcf()
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 105, in populate_from_vcf
[2015-03-07T21:18Z] sh-5-31.local: (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 457, in _prepare_variation
[2015-03-07T21:18Z] sh-5-31.local: vcf_id, self.v_id, anno_id, var.REF, ','.join(var.ALT),
[2015-03-07T21:18Z] sh-5-31.local: TypeError: sequence item 0: expected string, NoneType found
[2015-03-07T21:18Z] sh-5-31.local: Traceback (most recent call last):
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-07T21:18Z] sh-5-31.local: gemini.gemini_main.main()
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1139, in main
[2015-03-07T21:18Z] sh-5-31.local: args.func(parser, args)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 225, in loadchunk_fn
[2015-03-07T21:18Z] sh-5-31.local: gemini_load_chunk.load(parser, args)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 685, in load
[2015-03-07T21:18Z] sh-5-31.local: gemini_loader.populate_from_vcf()
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 105, in populate_from_vcf
[2015-03-07T21:18Z] sh-5-31.local: (variant, variant_impacts, extra_fields) = self._prepare_variation(var)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 457, in _prepare_variation
[2015-03-07T21:18Z] sh-5-31.local: vcf_id, self.v_id, anno_id, var.REF, ','.join(var.ALT),
[2015-03-07T21:18Z] sh-5-31.local: TypeError: sequence item 0: expected string, NoneType found
[2015-03-07T21:18Z] sh-5-31.local: Traceback (most recent call last):
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bin/gemini", line 6, in <module>
[2015-03-07T21:18Z] sh-5-31.local: gemini.gemini_main.main()
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1139, in main
[2015-03-07T21:18Z] sh-5-31.local: args.func(parser, args)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 225, in loadchunk_fn
[2015-03-07T21:18Z] sh-5-31.local: gemini_load_chunk.load(parser, args)
[2015-03-07T21:18Z] sh-5-31.local: File "/share/PI/euan/apps/bcbio/anaconda/lib/python2.7/site-packages/gemini/gemini_load_chunk.py", line 685, in load
Daryl; Sorry about that failure. I managed to replicate and pushed a fix to GEMINI, so if you update to the development version one last time:
gemini update --devel
it should hopefully finish cleanly now. Thanks for all the patience and please let us know if you run into anything else.
One of my joint calling runs (n=334 exomes) is failing the gemini load step. This is a different dataset than was reported in #773. The first lines that look suspicious in the log are below. The
final
directory gets added but the db is only 137M. The log does end in a memory warning so I'll have to try and track that down. Currently, I'm at 8G for the controller and 24G for the submission script.slurmstepd: Exceeded step memory limit at some point. Step may have been partially swapped out to disk