deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself
https://coder.deepseek.com/
MIT License
6.01k stars 433 forks source link

ERROR: ImportError: cannot import name 'SyncManager' from partially initialized module 'multiprocessing.managers' (most likely due to a circular import) #114

Open kokolerk opened 5 months ago

kokolerk commented 5 months ago

I just downloaded the repo and ran the Evaluation/Humaneval eval.sh in the bash command. ( with deepseek-coder-1.3b-base)

But I have the following errors:

Reading samples... 100%|████████████████████████████████████████████████████████████████████████████████████████| 164/164 [00:00<00:00, 13548.93it/s] Running test suites... 0%| | 0/164 [00:00<?, ?it/s] Traceback (most recent call last): File "/opt/tiger/deepcode/Evaluation/HumanEval/eval_pal.py", line 42, in evaluator.eval_model(model, accelerator) File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/opt/tiger/deepcode/Evaluation/HumanEval/humaneval.py", line 125, in eval_model self._calculate_final_score(accelerator) File "/opt/tiger/deepcode/Evaluation/HumanEval/humaneval.py", line 159, in _calculate_final_score res = evaluate_functional_correctness(input_file=logfilepath, problem_file=os.path.join(self.data_root, f"humaneval-{self.language}.jsonl"), tmp_dir=self.log_dir, timeout=timeout, language=runlang) File "/opt/tiger/deepcode/Evaluation/HumanEval/human_eval/evaluation.py", line 277, in evaluate_functional_correctness result = future.result() File "/usr/lib/python3.9/concurrent/futures/_base.py", line 433, in result return self.get_result() File "/usr/lib/python3.9/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run result = self.fn(self.args, **self.kwargs) File "/opt/tiger/deepcode/Evaluation/HumanEval/human_eval/execution.py", line 549, in check_correctness manager = Manager() File "/usr/lib/python3.9/multiprocessing/context.py", line 55, in Manager from .managers import SyncManager ImportError: cannot import name 'SyncManager' from partially initialized module 'multiprocessing.managers' (most likely due to a circular import) (/usr/lib/python3.9/multiprocessing/managers.py) 2024-02-01 13:34:30.269 n188-182-020:19533:23012 [1] NCCL INFO [Service thread] Connection closed by localRank 0 2024-02-01 13:34:30.269 n188-182-020:19532:23014 [0] NCCL INFO [Service thread] Connection closed by localRank 0 2024-02-01 13:34:30.269 n188-182-020:19534:23013 [2] NCCL INFO [Service thread] Connection closed by localRank 0 2024-02-01 13:34:34.341 n188-182-020:19532:19532 [0] NCCL INFO comm 0xb9bbef60 rank 0 nranks 3 cudaDev 0 busId 1a000 - Abort COMPLETE [2024-02-01 13:34:37,641] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19533 closing signal SIGTERM [2024-02-01 13:34:37,641] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19534 closing signal SIGTERM [2024-02-01 13:34:38,307] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 19532) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/tiger/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1033, in main() File "/home/tiger/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1029, in main launch_command(args) File "/home/tiger/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1014, in launch_command multi_gpu_launcher(args) File "/home/tiger/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 672, in multi_gpu_launcher distrib_run.run(args) File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

eval_pal.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-02-01_13:34:37 host : n188-182-020.byted.org rank : 0 (local_rank: 0) exitcode : 1 (pid: 19532) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html It is the multiprocessing import problem since I HAVEN'T changed a little code. I WONDER if you can solve it.
xuewenman commented 4 months ago

@kokolerk hello~ have you solved this problem?

mst272 commented 3 months ago

这个应该是python版本问题,把环境的python版本改成3.8估计可以好。

Calvinnncy97 commented 3 months ago

It can't run under the ThreadPoolExecutor context. Simply remove it to run in a for loop would solve it, although it will run a bit slower.