Closed sleepwalker2017 closed 11 months ago
On A30, It works fine.
On V100, run llama-13B using two GPUs.
NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.7
[ERROR][2023-09-12 05:21:16.518][nccl_utils.h:110] NCCL error(code:1) on ncclGroupEnd [ERROR][2023-09-12 05:21:16.518][nccl_utils.h:110] NCCL error(code:1) on ncclGroupEnd [ERROR][2023-09-12 05:21:16.518][kernel.cc:176] [ERROR][2023-09-12 05:21:16.518][kernel.cc:176] DoExecute kernel [/tok_embeddings/ParallelEmbedding] failed: other error DoExecute kernel [/tok_embeddings/ParallelEmbedding] failed: other error [ERROR][2023-09-12 05:21:16.518][sequential_scheduler.cc:130] [ERROR][2023-09-12 05:21:16.518][sequential_scheduler.cc:130] exec kernel[/tok_embeddings/ParallelEmbedding] of type[pmx:ParallelEmbedding:1] failed: other error exec kernel[/tok_embeddings/ParallelEmbedding] of type[pmx:ParallelEmbedding:1] failed: other error [ERROR][2023-09-12 05:21:16.518][runtime_impl.cc:333] Run() failed: other error [ERROR][2023-09-12 05:21:16.518][runtime_impl.cc:333] Run() failed: other error [ERROR][2023-09-12 05:21:16.519][llama_worker.cc:922] ParallelExecute(RunModelTask) failed. [INFO][2023-09-12 05:21:16.519][llama_worker.cc:1043] waiting for request ... [ERROR][2023-09-12 05:21:16.520][nccl_utils.h:110] NCCL error(code:1) on ncclGroupEnd [ERROR][2023-09-12 05:21:16.520][kernel.cc:176] DoExecute kernel [/tok_embeddings/ParallelEmbedding] failed: other error [ERROR][2023-09-12 05:21:16.520][sequential_scheduler.cc:130] exec kernel[/tok_embeddings/ParallelEmbedding] of type[pmx:ParallelEmbedding:1] failed: other error [ERROR][2023-09-12 05:21:16.520][runtime_impl.cc:333] Run() failed: other error [ERROR][2023-09-12 05:21:16.520][nccl_utils.h:110] NCCL error(code:1) on ncclGroupEnd
Here is my config.json:
{ "model_dir": "/data/codes/ppl/llama-13b", "model_param_path": "/data/codes/ppl/llama-13b/params.json", "tokenizer_path": "/data/LLaMA-7B/tokenizer.model", "tensor_parallel_size": 2, "top_p": 0.0, "top_k": 1, "max_tokens_scale": 0.94, "max_tokens_per_request": 4096, "max_running_batch": 1024, "host": "0.0.0.0", "port": 10086 }
Please try again with latest version, and please issue with our template.
@Vincent-syr hello, 请问这个是怎么解决的,我最近用最新的代码也遇见这个问题了
On A30, It works fine.
On V100, run llama-13B using two GPUs.
NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.7
Here is my config.json: