Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Apache License 2.0
6k
stars
520
forks
source link
RuntimeError: cutlassF: no kernel found to launch! #471
I ran it according to the process (on a single-chip V100GPU). When I ran generate.py, everything went fine, but when I ran lora.py, I encountered the following error:
(llama) yaxuanw@SINIAN-01:~/lit-llama$ python finetune/lora.py
/home/yaxuanw/lit-llama/finetune/lora.py:216: JsonargparseDeprecationWarning:
Only use the public API as described in https://jsonargparse.readthedocs.io/en/stable/#api-reference.
Importing from jsonargparse.cli is kept only to avoid breaking code that does not correctly use the public
API. It will no longer be available from v5.0.0.
from jsonargparse.cli import CLI
Seed set to 1337
Traceback (most recent call last):
File "/home/yaxuanw/lit-llama/finetune/lora.py", line 218, in
CLI(main)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/jsonargparse/_cli.py", line 96, in CLI
return _run_component(components, cfg_init)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/jsonargparse/_cli.py", line 181, in _run_component
return component(cfg)
File "/home/yaxuanw/lit-llama/finetune/lora.py", line 78, in main
train(fabric, model, optimizer, train_data, val_data, tokenizer_path, out_dir)
File "/home/yaxuanw/lit-llama/finetune/lora.py", line 112, in train
logits = model(input_ids)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/lightning/fabric/wrappers.py", line 119, in forward
output = self._forward_module(*args, kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/home/yaxuanw/lit-llama/litllama/model.py", line 104, in forward
x, = block(x, rope, mask, max_seq_length)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "/home/yaxuanw/lit-llama/lit_llama/model.py", line 163, in forward
h, new_kv_cache = self.attn(self.rms_1(x), rope, mask, max_seq_length, input_pos, kv_cache)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/yaxuanw/lit-llama/lit_llama/model.py", line 228, in forward
y = F.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0)
RuntimeError: cutlassF: no kernel found to launch!
I ran it according to the process (on a single-chip V100GPU). When I ran generate.py, everything went fine, but when I ran lora.py, I encountered the following error: (llama) yaxuanw@SINIAN-01:~/lit-llama$ python finetune/lora.py /home/yaxuanw/lit-llama/finetune/lora.py:216: JsonargparseDeprecationWarning: Only use the public API as described in https://jsonargparse.readthedocs.io/en/stable/#api-reference. Importing from jsonargparse.cli is kept only to avoid breaking code that does not correctly use the public API. It will no longer be available from v5.0.0.
from jsonargparse.cli import CLI Seed set to 1337 Traceback (most recent call last): File "/home/yaxuanw/lit-llama/finetune/lora.py", line 218, in
CLI(main)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/jsonargparse/_cli.py", line 96, in CLI
return _run_component(components, cfg_init)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/jsonargparse/_cli.py", line 181, in _run_component
return component(cfg)
File "/home/yaxuanw/lit-llama/finetune/lora.py", line 78, in main
train(fabric, model, optimizer, train_data, val_data, tokenizer_path, out_dir)
File "/home/yaxuanw/lit-llama/finetune/lora.py", line 112, in train
logits = model(input_ids)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/lightning/fabric/wrappers.py", line 119, in forward
output = self._forward_module(*args, kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/home/yaxuanw/lit-llama/litllama/model.py", line 104, in forward
x, = block(x, rope, mask, max_seq_length)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "/home/yaxuanw/lit-llama/lit_llama/model.py", line 163, in forward
h, new_kv_cache = self.attn(self.rms_1(x), rope, mask, max_seq_length, input_pos, kv_cache)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/home/yaxuanw/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/yaxuanw/lit-llama/lit_llama/model.py", line 228, in forward
y = F.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0)
RuntimeError: cutlassF: no kernel found to launch!