arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.71k stars 429 forks source link

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle) #146

Open blackblue9 opened 8 months ago

blackblue9 commented 8 months ago

When I try to merge two Yi-34B-chat into one MoE model, in the last step expert prompts: I get the following error: RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle).But when I merged the base model of Yi-34B, everything was normal. What's the problem?

The merged commands I use are as follows: mergekit-moe /mnt/home/mergekit-mixtral/yi-2x34B-chat-merge. yaml /mnt/home/merge_moe/yi_2x34B_chat --trust-remote-code --i-understand-this-is-not-useful- without-training --device cuda

The complete log is as follows: Warm up loaders: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 346.19it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [14:00<00:00, 93.39s/it] Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:25<00:00, 1.72s/it] expert prompts: 0%| | 0/2 [00:00<?, ?it/s] Traceback (most recent call last): File "/usr/local/miniconda3/envs/mergekit/bin/mergekit-moe", line 8, in <module> sys.exit(main()) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/mnt/home/quxinji/mergekit-mixtral/mergekit/options.py", line 76, in wrapper f(*args, **kwargs) File "/mnt/home/quxinji/mergekit-mixtral/mergekit/scripts/mixtral_moe.py", line 453, in main build( File "/mnt/home/mergekit-mixtral/mergekit/scripts/mixtral_moe.py", line 368, in build gate_vecs = get_gate_params( File "/mnt/home/mergekit-mixtral/mergekit/scripts/mixtral_moe.py", line 175, in get_gate_params hidden_states = _do_it(tokenize_prompts(expert.positive_prompts, tokenizer)) File "/mnt/home/mergekit-mixtral/mergekit/scripts/mixtral_moe.py", line 169, in _do_it return get_hidden_states( File "/mnt/home/mergekit-mixtral/mergekit/scripts/mixtral_moe.py", line 76, in get_hidden_states output: CausalLMOutputWithPast = model( File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward outputs = self.model( File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1070, in forward layer_outputs = decoder_layer( File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 798, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 386, in forward query_states = self.q_proj(hidden_states) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/miniconda3/envs/mergekit/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

cg123 commented 8 months ago

This looks like you might not have enough VRAM. For the hidden gate mode you need enough to actually inference with the model. For a 34B model in FP16 you would need an A100. Have you tried the --load-in-8bit or --load-in-4bit arguments?