SCIR-HI / Huatuo-Llama-Med-Chinese

Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调
Apache License 2.0
4.31k stars 422 forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) #89

Open daiyizheng opened 9 months ago

daiyizheng commented 9 months ago

我使用llama 65b 在多GPU A100 infer.py推理的时候报了错误, 在finetune.py没有问题

daiyizheng commented 9 months ago

/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama did not contain libcudart.so as expected! Searching further paths... warn(msg) /slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/run/user/80104/vscode-git-4b808d81bf.sock')} warn(msg) /slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/run/user/80104/vscode-ipc-0b328df5-e364-494a-b230-9f7e99271b5b.sock')} warn(msg) /slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('() { eval /usr/bin/modulecmd bash $*\n}')} warn(msg) The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.

Loading checkpoint shards: 0%| | 0/81 [00:00<?, ?it/s] Loading checkpoint shards: 1%| | 1/81 [00:00<01:08, 1.16it/s] Loading checkpoint shards: 2%|▏ | 2/81 [00:01<01:11, 1.11it/s] Loading checkpoint shards: 4%|▎ | 3/81 [00:02<01:13, 1.07it/s] Loading checkpoint shards: 5%|▍ | 4/81 [00:03<01:10, 1.09it/s] Loading checkpoint shards: 6%|▌ | 5/81 [00:04<01:09, 1.09it/s] Loading checkpoint shards: 7%|▋ | 6/81 [00:05<01:09, 1.08it/s] Loading checkpoint shards: 9%|▊ | 7/81 [00:06<01:10, 1.05it/s] Loading checkpoint shards: 10%|▉ | 8/81 [00:07<01:07, 1.08it/s] Loading checkpoint shards: 11%|█ | 9/81 [00:08<01:08, 1.05it/s] Loading checkpoint shards: 12%|█▏ | 10/81 [00:09<01:06, 1.07it/s] Loading checkpoint shards: 14%|█▎ | 11/81 [00:10<01:05, 1.06it/s] Loading checkpoint shards: 15%|█▍ | 12/81 [00:11<01:04, 1.08it/s] Loading checkpoint shards: 16%|█▌ | 13/81 [00:12<01:02, 1.08it/s] Loading checkpoint shards: 17%|█▋ | 14/81 [00:13<01:02, 1.07it/s] Loading checkpoint shards: 19%|█▊ | 15/81 [00:14<01:03, 1.05it/s] Loading checkpoint shards: 20%|█▉ | 16/81 [00:15<01:02, 1.04it/s] Loading checkpoint shards: 21%|██ | 17/81 [00:16<01:02, 1.02it/s] Loading checkpoint shards: 22%|██▏ | 18/81 [00:16<01:01, 1.03it/s] Loading checkpoint shards: 23%|██▎ | 19/81 [00:17<01:00, 1.03it/s] Loading checkpoint shards: 25%|██▍ | 20/81 [00:18<00:59, 1.02it/s] Loading checkpoint shards: 26%|██▌ | 21/81 [00:19<00:58, 1.02it/s] Loading checkpoint shards: 27%|██▋ | 22/81 [00:20<00:57, 1.02it/s] Loading checkpoint shards: 28%|██▊ | 23/81 [00:21<00:57, 1.01it/s] Loading checkpoint shards: 30%|██▉ | 24/81 [00:22<00:55, 1.04it/s] Loading checkpoint shards: 31%|███ | 25/81 [00:23<00:52, 1.07it/s] Loading checkpoint shards: 32%|███▏ | 26/81 [00:24<00:51, 1.07it/s] Loading checkpoint shards: 33%|███▎ | 27/81 [00:25<00:50, 1.07it/s] Loading checkpoint shards: 35%|███▍ | 28/81 [00:26<00:49, 1.06it/s] Loading checkpoint shards: 36%|███▌ | 29/81 [00:27<00:48, 1.08it/s] Loading checkpoint shards: 37%|███▋ | 30/81 [00:28<00:47, 1.07it/s] Loading checkpoint shards: 38%|███▊ | 31/81 [00:29<00:45, 1.09it/s] Loading checkpoint shards: 40%|███▉ | 32/81 [00:30<00:45, 1.07it/s] Loading checkpoint shards: 41%|████ | 33/81 [00:31<00:44, 1.09it/s] Loading checkpoint shards: 42%|████▏ | 34/81 [00:31<00:42, 1.10it/s] Loading checkpoint shards: 43%|████▎ | 35/81 [00:32<00:42, 1.09it/s] Loading checkpoint shards: 44%|████▍ | 36/81 [00:33<00:40, 1.10it/s] Loading checkpoint shards: 46%|████▌ | 37/81 [00:34<00:38, 1.13it/s] Loading checkpoint shards: 47%|████▋ | 38/81 [00:35<00:38, 1.12it/s] Loading checkpoint shards: 48%|████▊ | 39/81 [00:36<00:37, 1.12it/s] Loading checkpoint shards: 49%|████▉ | 40/81 [00:37<00:37, 1.08it/s] Loading checkpoint shards: 51%|█████ | 41/81 [00:38<00:36, 1.11it/s] Loading checkpoint shards: 52%|█████▏ | 42/81 [00:39<00:34, 1.11it/s] Loading checkpoint shards: 53%|█████▎ | 43/81 [00:40<00:34, 1.10it/s] Loading checkpoint shards: 54%|█████▍ | 44/81 [00:41<00:34, 1.08it/s] Loading checkpoint shards: 56%|█████▌ | 45/81 [00:42<00:33, 1.08it/s] Loading checkpoint shards: 57%|█████▋ | 46/81 [00:43<00:33, 1.06it/s] Loading checkpoint shards: 58%|█████▊ | 47/81 [00:43<00:31, 1.07it/s] Loading checkpoint shards: 59%|█████▉ | 48/81 [00:44<00:30, 1.09it/s] Loading checkpoint shards: 60%|██████ | 49/81 [00:45<00:29, 1.08it/s] Loading checkpoint shards: 62%|██████▏ | 50/81 [00:46<00:29, 1.07it/s] Loading checkpoint shards: 63%|██████▎ | 51/81 [00:47<00:27, 1.07it/s] Loading checkpoint shards: 64%|██████▍ | 52/81 [00:48<00:27, 1.05it/s] Loading checkpoint shards: 65%|██████▌ | 53/81 [00:49<00:27, 1.03it/s] Loading checkpoint shards: 67%|██████▋ | 54/81 [00:50<00:25, 1.04it/s] Loading checkpoint shards: 68%|██████▊ | 55/81 [00:51<00:26, 1.00s/it] Loading checkpoint shards: 69%|██████▉ | 56/81 [00:52<00:24, 1.02it/s] Loading checkpoint shards: 70%|███████ | 57/81 [00:53<00:23, 1.02it/s] Loading checkpoint shards: 72%|███████▏ | 58/81 [00:54<00:22, 1.04it/s] Loading checkpoint shards: 73%|███████▎ | 59/81 [00:55<00:20, 1.06it/s] Loading checkpoint shards: 74%|███████▍ | 60/81 [00:56<00:19, 1.06it/s] Loading checkpoint shards: 75%|███████▌ | 61/81 [00:57<00:19, 1.05it/s] Loading checkpoint shards: 77%|███████▋ | 62/81 [00:58<00:18, 1.05it/s] Loading checkpoint shards: 78%|███████▊ | 63/81 [00:59<00:17, 1.04it/s] Loading checkpoint shards: 79%|███████▉ | 64/81 [01:00<00:15, 1.06it/s] Loading checkpoint shards: 80%|████████ | 65/81 [01:01<00:14, 1.07it/s] Loading checkpoint shards: 81%|████████▏ | 66/81 [01:01<00:13, 1.08it/s] Loading checkpoint shards: 83%|████████▎ | 67/81 [01:02<00:12, 1.10it/s] Loading checkpoint shards: 84%|████████▍ | 68/81 [01:03<00:12, 1.08it/s] Loading checkpoint shards: 85%|████████▌ | 69/81 [01:04<00:10, 1.10it/s] Loading checkpoint shards: 86%|████████▋ | 70/81 [01:05<00:10, 1.09it/s] Loading checkpoint shards: 88%|████████▊ | 71/81 [01:06<00:09, 1.07it/s] Loading checkpoint shards: 89%|████████▉ | 72/81 [01:07<00:08, 1.09it/s] Loading checkpoint shards: 90%|█████████ | 73/81 [01:08<00:07, 1.05it/s] Loading checkpoint shards: 91%|█████████▏| 74/81 [01:09<00:06, 1.07it/s] Loading checkpoint shards: 93%|█████████▎| 75/81 [01:10<00:05, 1.08it/s] Loading checkpoint shards: 94%|█████████▍| 76/81 [01:11<00:04, 1.09it/s] Loading checkpoint shards: 95%|█████████▌| 77/81 [01:12<00:03, 1.08it/s] Loading checkpoint shards: 96%|█████████▋| 78/81 [01:13<00:02, 1.05it/s] Loading checkpoint shards: 98%|█████████▊| 79/81 [01:14<00:01, 1.06it/s] Loading checkpoint shards: 99%|█████████▉| 80/81 [01:15<00:00, 1.06it/s] Loading checkpoint shards: 100%|██████████| 81/81 [01:15<00:00, 1.21it/s] Loading checkpoint shards: 100%|██████████| 81/81 [01:15<00:00, 1.07it/s] Traceback (most recent call last): File "/slurm/home/yrd/shaolab/daiyizheng/nlp/Huatuo-Llama-Med-Chinese/infer.py", line 132, in fire.Fire(main) File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "/slurm/home/yrd/shaolab/daiyizheng/nlp/Huatuo-Llama-Med-Chinese/infer.py", line 118, in main infer_from_json(instruct_dir) File "/slurm/home/yrd/shaolab/daiyizheng/nlp/Huatuo-Llama-Med-Chinese/infer.py", line 105, in infer_from_json model_output = evaluate(instruction) File "/slurm/home/yrd/shaolab/daiyizheng/nlp/Huatuo-Llama-Med-Chinese/infer.py", line 87, in evaluate generation_output = model.generate( File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/peft/peft_model.py", line 731, in generate outputs = self.base_model.generate(kwargs) File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/transformers/generation/utils.py", line 1611, in generate return self.beam_search( File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/transformers/generation/utils.py", line 2982, in beam_search model_kwargs["past_key_values"] = self._reorder_cache(model_kwargs["past_key_values"], beam_idx) File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 762, in _reorder_cache reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),) File "/slurm/home/yrd/shaolab/daiyizheng/.conda/envs/llama/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 762, in reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

daiyizheng commented 9 months ago

我不规范的解决方案: