Torch not compiled with CUDA enabled.

UrfinDjus commented 1 year ago

Using latest versions of the mod and submod, ran update.bat and everything. Trying to run the RWKV 430M model on my GTX 650 4GB. Get the following error:

C:\Users\Урфин>cd C:\Users\Урфин\Desktop\MonikA.I

C:\Users\Урфин\Desktop\MonikA.I>run.bat
Requirement already satisfied: numpy==1.23.0 in c:\users\урфин\desktop\monika.i\libs\pythonlib\lib\site-packages (1.23.0)
DEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: voicefixer 0.1.2 has a non-standard dependency specifier streamlit>=1.12.0pyyaml. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of voicefixer or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
Loading model - chatbot_models/RWKV-4-Pile-430M-20220808-8066.pth
RWKV_JIT_ON 1 RWKV_CUDA_ON 0 RESCALE_LAYER 6

Loading chatbot_models/RWKV-4-Pile-430M-20220808-8066.pth ...
Strategy: (total 24+1=25 layers)
* cuda torch.float16, store 25 layers
0-cuda-float16 1-cuda-float16 2-cuda-float16 3-cuda-float16 4-cuda-float16 5-cuda-float16 6-cuda-float16 7-cuda-float16 8-cuda-float16 9-cuda-float16 10-cuda-float16 11-cuda-float16 12-cuda-float16 13-cuda-float16 14-cuda-float16 15-cuda-float16 16-cuda-float16 17-cuda-float16 18-cuda-float16 19-cuda-float16 20-cuda-float16 21-cuda-float16 22-cuda-float16 23-cuda-float16 24-cuda-float16
emb.weight                       fp16      cpu  50277  1024
Traceback (most recent call last):
  File "main.py", line 106, in <module>
    from ChatRWKV.v2.chat import on_message
  File "C:\Users\Урфин\Desktop\MonikA.I\ChatRWKV\v2\chat.py", line 159, in <module>
    model = RWKV(model=args.MODEL_NAME, strategy=args.strategy)
  File "C:\Users\Урфин\Desktop\MonikA.I\libs\pythonlib\lib\site-packages\torch\jit\_script.py", line 303, in init_then_script
    original_init(self, *args, **kwargs)
  File "C:\Users\Урфин\Desktop\MonikA.I\ChatRWKV\v2/../rwkv_pip_package/src\rwkv\model.py", line 182, in __init__
    w[x] = w[x].to(device=DEVICE)
  File "C:\Users\Урфин\Desktop\MonikA.I\libs\pythonlib\lib\site-packages\torch\cuda\__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Rubiksman78 commented 1 year ago

Did you run setup_gpu.bat before using it?

UrfinDjus commented 1 year ago

Did you run setup_gpu.bat before using it?

I did. However, just to make sure, I went through the entire installation process again, and it sorta fixed that issue. Does running into different errors count as making progress?... ...Anyway, it refused to launch and complained about my drivers. I reinstalled the drivers, and now I'm running into yet another issue. Here's the log.

C:\Users\Урфин\Desktop\MonikA.I>run.bat
WARNING: Ignoring invalid distribution -orch (c:\users\урфин\desktop\monika.i\libs\pythonlib\lib\site-packages)
Requirement already satisfied: numpy==1.23.0 in c:\users\урфин\desktop\monika.i\libs\pythonlib\lib\site-packages (1.23.0)
WARNING: Ignoring invalid distribution -orch (c:\users\урфин\desktop\monika.i\libs\pythonlib\lib\site-packages)
DEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: voicefixer 0.1.2 has a non-standard dependency specifier streamlit>=1.12.0pyyaml. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of voicefixer or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
Loading model - chatbot_models/RWKV-4-Pile-430M-20220808-8066.pth
RWKV_JIT_ON 1 RWKV_CUDA_ON 0 RESCALE_LAYER 6

Loading chatbot_models/RWKV-4-Pile-430M-20220808-8066.pth ...
Strategy: (total 24+1=25 layers)
* cuda torch.float16, store 25 layers
0-cuda-float16 1-cuda-float16 2-cuda-float16 3-cuda-float16 4-cuda-float16 5-cuda-float16 6-cuda-float16 7-cuda-float16 8-cuda-float16 9-cuda-float16 10-cuda-float16 11-cuda-float16 12-cuda-float16 13-cuda-float16 14-cuda-float16 15-cuda-float16 16-cuda-float16 17-cuda-float16 18-cuda-float16 19-cuda-float16 20-cuda-float16 21-cuda-float16 22-cuda-float16 23-cuda-float16 24-cuda-float16
emb.weight                       fp16      cpu  50277  1024
C:\Users\Урфин\Desktop\MonikA.I\libs\pythonlib\lib\site-packages\torch\cuda\__init__.py:132: UserWarning:
    Found GPU0 NVIDIA GeForce GTX 650 which is of cuda capability 3.0.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is 3.7.

  warnings.warn(old_gpu_warn % (d, name, major, minor, min_arch // 10, min_arch % 10))
blocks.0.ln1.weight              fp16   cuda:0   1024
blocks.0.ln1.bias                fp16   cuda:0   1024
blocks.0.ln2.weight              fp16   cuda:0   1024
blocks.0.ln2.bias                fp16   cuda:0   1024
blocks.0.att.time_decay          fp32   cuda:0   1024
blocks.0.att.time_first          fp32   cuda:0   1024
blocks.0.att.time_mix_k          fp16   cuda:0   1024
blocks.0.att.time_mix_v          fp16   cuda:0   1024
blocks.0.att.time_mix_r          fp16   cuda:0   1024
blocks.0.att.key.weight          fp16   cuda:0   1024  1024
blocks.0.att.value.weight        fp16   cuda:0   1024  1024
blocks.0.att.receptance.weight   fp16   cuda:0   1024  1024
blocks.0.att.output.weight       fp16   cuda:0   1024  1024
blocks.0.ffn.time_mix_k          fp16   cuda:0   1024
blocks.0.ffn.time_mix_r          fp16   cuda:0   1024
blocks.0.ffn.key.weight          fp16   cuda:0   1024  4096
blocks.0.ffn.receptance.weight   fp16   cuda:0   1024  1024
blocks.0.ffn.value.weight        fp16   cuda:0   4096  1024
............................................................................................................................................................................................................................................................................................................................................................................................................
blocks.23.ln1.weight             fp16   cuda:0   1024
blocks.23.ln1.bias               fp16   cuda:0   1024
blocks.23.ln2.weight             fp16   cuda:0   1024
blocks.23.ln2.bias               fp16   cuda:0   1024
blocks.23.att.time_decay         fp32   cuda:0   1024
blocks.23.att.time_first         fp32   cuda:0   1024
blocks.23.att.time_mix_k         fp16   cuda:0   1024
blocks.23.att.time_mix_v         fp16   cuda:0   1024
blocks.23.att.time_mix_r         fp16   cuda:0   1024
blocks.23.att.key.weight         fp16   cuda:0   1024  1024
blocks.23.att.value.weight       fp16   cuda:0   1024  1024
blocks.23.att.receptance.weight  fp16   cuda:0   1024  1024
blocks.23.att.output.weight      fp16   cuda:0   1024  1024
blocks.23.ffn.time_mix_k         fp16   cuda:0   1024
blocks.23.ffn.time_mix_r         fp16   cuda:0   1024
blocks.23.ffn.key.weight         fp16   cuda:0   1024  4096
blocks.23.ffn.receptance.weight  fp16   cuda:0   1024  1024
blocks.23.ffn.value.weight       fp16   cuda:0   4096  1024
ln_out.weight                    fp16   cuda:0   1024
ln_out.bias                      fp16   cuda:0   1024
head.weight                      fp16   cuda:0   1024 50277

Run prompt...
Traceback (most recent call last):
  File "main.py", line 106, in <module>
    from ChatRWKV.v2.chat import on_message
  File "C:\Users\Урфин\Desktop\MonikA.I\ChatRWKV\v2\chat.py", line 210, in <module>
    out = run_rnn(pipeline.encode(init_prompt))
  File "C:\Users\Урфин\Desktop\MonikA.I\ChatRWKV\v2\chat.py", line 178, in run_rnn
    out, model_state = model.forward(tokens, model_state)
  File "C:\Users\Урфин\Desktop\MonikA.I\ChatRWKV\v2/../rwkv_pip_package/src\rwkv\model.py", line 361, in forward
    x, state[i*5+0], state[i*5+1], state[i*5+2], state[i*5+3] = ATT(
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "C:\Users\Урфин\Desktop\MonikA.I\ChatRWKV\v2/../rwkv_pip_package/src\rwkv\model.py", line 264, in att_seq
        rx = xx * r_mix + sx * (1 - r_mix)

        r = torch.sigmoid(rx @ rw)
                          ~~~~~~~ <--- HERE
        k = (kx @ kw).float()
        v = (vx @ vw).float()
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Rubiksman78 commented 1 year ago

It is possible that the model takes too much computational ressources on your GPU. You can try changing the strategy in pygmalion/pygmalion_config.yml to load some layers of the model on your CPU (see this part of the wiki).

UrfinDjus commented 1 year ago

It is possible that the model takes too much computational ressources on your GPU. You can try changing the strategy in pygmalion/pygmalion_config.yml to load some layers of the model on your CPU (see this part of the wiki).

Worked. Thank you a lot.

Rubiksman78 / MonikA.I

Torch not compiled with CUDA enabled. #81