Closed autapomorphy closed 4 years ago
You can turn off multi processing in config to save some memory.
On Sat, Apr 25, 2020, 9:46 AM autapomorphy notifications@github.com wrote:
I tried to run the notebook Run Pre-defined problems.ipynb after
train_bert_multitask(problem='weibo_ner&weibo_cws', num_gpus=1, num_epochs=3)
I got the error message:
Traceback (most recent call last): File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/params.py", line 206, in assign_problem self.get_data_info(self.problem_list, self.ckpt_dir) File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/params.py", line 270, in get_data_info list(self.read_data_fnproblem http://self,)) File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/create_generators.py", line 300, in create_single_problem_generator example_list=example) for example in example_list File "/cluster/tufts/
/lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/parallel.py", line 1017, in call self.retrieve() File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/parallel.py", line 909, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/cluster/tufts/
/lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 562, in wrap_future_result return future.result(timeout=timeout) File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/concurrent/futures/_base.py", line 435, in result return self.get_result() File "/cluster/tufts/**/lib/anaconda3/envs/1001-nlp/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result raise self._exception joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}
How much RAM do I need?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JayYip/bert-multitask-learning/issues/40, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADS2OTAVTMFC2HJ5SBYQE73ROI6IPANCNFSM4MQRBVLQ .
I tried to run the notebook Run Pre-defined problems.ipynb after
train_bert_multitask(problem='weibo_ner&weibo_cws', num_gpus=1, num_epochs=3)
I got the error message:
Traceback (most recent call last): File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/params.py", line 206, in assign_problem self.get_data_info(self.problem_list, self.ckpt_dir) File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/params.py", line 270, in get_data_info list(self.read_data_fnproblem)) File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/create_generators.py", line 300, in create_single_problem_generator example_list=example) for example in example_list File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/parallel.py", line 1017, in call self.retrieve() File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/parallel.py", line 909, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 562, in wrap_future_result return future.result(timeout=timeout) File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/concurrent/futures/_base.py", line 435, in result return self.get_result() File "/cluster/tufts/**/lib/anaconda3/envs/1001-nlp/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result raise self._exception joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}
How much RAM do I need?