Open 8nb24 opened 5 years ago
hmm. I'll have a look and see if I can reduce the memory use a bit or see why this might be happening. Even with 1800 samples, it shouldn't use much memory.
Thanks for looking. I got this same error with and without bcolz_index made. I looked at the node statistics and it looks like it never exceeded 3.5G of memory in use.
could you add --filter " gene != '' "
to you comp_hets call? or if you already have a --filter, add AND gene != ''
?
and let me know if that reduces the memory use?
Output of
gemini --version
: gemini 0.20.1...
When running
gemini comp_hets
on a large database (~1800 individuals WGS) I get the following error:Traceback (most recent call last): File "/usr/local/apps/gemini/0.20.1/bin/gemini", line 7, in
gemini_main.main()
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_main.py", line 1248, in main
args.func(parser, args)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_main.py", line 710, in comp_hets_fn
CompoundHet(args).run()
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gim.py", line 307, in run
for i, s in enumerate(self.report_candidates()):
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gim.py", line 213, in report_candidates
for gene, li in self.candidates():
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gim.py", line 459, in candidates
for grp, li in self.gen_candidates('gene'):
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gim.py", line 115, in gen_candidates
self.gq.run(q, needs_genotypes=True)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 653, in run
self.result_proxy = res = iter(self._apply_query())
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 907, in _apply_query
res = self._execute_query()
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 879, in _execute_query
res = self.conn.execute(sql.text(self.query))
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1176, in execute
bind, close_with_result=True).execute(clause, params or {})
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
return meth(self, multiparams, params)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
context)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
context)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
cursor.execute(statement, parameters) MemoryError
I attempted to run this command on a large memory node allocated specifically to this task unsuccessfully. I am wondering if there is an alternative way to store the database that would alleviate this issue or how you would otherwise advise?