HazyResearch / H3

Language Modeling with the H3 State Space Model
Apache License 2.0
511 stars 53 forks source link

Error Running `generate_text_h3.py` (`CUDA error: CUBLAS_STATUS_NOT_INITIALIZED`) #17

Closed gdebayan closed 1 year ago

gdebayan commented 1 year ago

Hey There!

I followed the steps mentioned in the README.MD and when I try running generate_text_h3.py I get the following error.

Some Notes: 1) I installed https://github.com/HazyResearch/flash-attention from source. 2) As I got import errors, I installed the libraries one-by-one.
NOTE: I installed the versions that were default 3) I'm using a Linux Ubuntu machine

I'll be happy to share any other info (regarding versioning, etc.) you might have.

Here is the error trace for your reference:

[[A(h3) debayan@lambda-femtosense-2:~/h3$ PYTHONPATH=$(pwd)/H3 python3 -i  H3/examples/generate_text_h3.py --ckpt H3-125M/model.pt --prompt "Hungry Hungry Hippos: Towards Language Modeling With State" --dmodel 768 --nlayer 12 --attn-layer-idx 6 --nheads=12
args.ckpt H3-125M/model.pt
Traceback (most recent call last):
  File "/home/debayan/h3/H3/examples/generate_text_h3.py", line 60, in <module>
    output_ids = model.generate(input_ids=input_ids, max_length=max_length,
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/flash_attn-0.2.8-py3.10-linux-x86_64.egg/flash_attn/utils/generation.py", line 150, in generate
    output = decode(input_ids, self, max_length, top_k=top_k, top_p=top_p,
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/flash_attn-0.2.8-py3.10-linux-x86_64.egg/flash_attn/utils/generation.py", line 107, in decode
    logits = model(input_ids, inference_params=inference_params).logits[:, -1]
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/debayan/h3/H3/src/models/ssm_seq.py", line 187, in forward
    hidden_states = self.backbone(input_ids, position_ids=position_ids,
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/debayan/h3/H3/src/models/ssm_seq.py", line 142, in forward
    hidden_states, residual = layer(hidden_states, residual, mixer_kwargs=mixer_kwargs)
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/flash_attn-0.2.8-py3.10-linux-x86_64.egg/flash_attn/modules/block.py", line 126, in forward
    hidden_states = self.mixer(hidden_states, **mixer_kwargs)
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/debayan/h3/H3/src/models/ssm/h3.py", line 114, in forward
    q = self.q_proj.weight @ u.T + self.q_proj.bias.to(dtype).unsqueeze(-1)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
>>> 
gdebayan commented 1 year ago

Also, for what its worth --- this is the error I get in 'cpu' mode:

(h3) debayan@lambda-femtosense-2:~/h3$ PYTHONPATH=$(pwd)/H3 python3 -i  H3/examples/generate_text_h3.py --ckpt H3-125M/model.pt --prompt "Hungry Hungry Hippos: Towards Language Modeling With State" --dmodel 768 --nlayer 12 --attn-layer-idx 6 --nheads=12
args.ckpt H3-125M/model.pt
Traceback (most recent call last):
  File "/home/debayan/h3/H3/examples/generate_text_h3.py", line 60, in <module>
    output_ids = model.generate(input_ids=input_ids, max_length=max_length,
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/flash_attn-0.2.8-py3.10-linux-x86_64.egg/flash_attn/utils/generation.py", line 150, in generate
    output = decode(input_ids, self, max_length, top_k=top_k, top_p=top_p,
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/flash_attn-0.2.8-py3.10-linux-x86_64.egg/flash_attn/utils/generation.py", line 107, in decode
    logits = model(input_ids, inference_params=inference_params).logits[:, -1]
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/debayan/h3/H3/src/models/ssm_seq.py", line 187, in forward
    hidden_states = self.backbone(input_ids, position_ids=position_ids,
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/debayan/h3/H3/src/models/ssm_seq.py", line 142, in forward
    hidden_states, residual = layer(hidden_states, residual, mixer_kwargs=mixer_kwargs)
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/flash_attn-0.2.8-py3.10-linux-x86_64.egg/flash_attn/modules/block.py", line 106, in forward
    hidden_states = self.norm1(residual.to(dtype=self.norm1.weight.dtype))
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/debayan/miniconda3/envs/h3/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
>>> 
DanFu09 commented 1 year ago

Thanks for the error reports! What GPU are you running on? Can you share the output of nvidia-smi and your CUDA toolkit version (e.g., nvcc --version?)

gdebayan commented 1 year ago

Hey @DanFu09, apologies for the delayed response (Github did not notify me of your comment)

Looks like it was a simple issue wrt not having enough GPU memory. Works fine now! thanks! Reference: https://discuss.pytorch.org/t/cuda-error-cublas-status-not-initialized-when-calling-cublascreate-handle/125450/2