BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.7k stars 867 forks source link

XXX is currently not supported in Torchscript: 我不知道如何解决这个问题 there is something wrong with cuda in my device #258

Open LeC-Z opened 2 months ago

LeC-Z commented 2 months ago

我尝试了我所有设备包括 v100/a100/L40S 的设备 ,都无法正常跑通 RWKV-v5 /demo-training-prepare.sh (可能是设备比较老旧)

最接近的一次出现了如下错误:

RWKV_MY_TESTING x060
Using /root/.cache/torch_extensions/py310_cu116 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu116/wkv6/build.ninja...
Building extension module wkv6...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module wkv6...
Traceback (most recent call last):
  File "/2/3/casanovo/tests/RWKV-LM/RWKV-v5/train.py", line 255, in <module>
    model = RWKV(args)
  File "/2/3/casanovo/tests/RWKV-LM/RWKV-v5/src/model.py", line 950, in __init__
    self.blocks = nn.ModuleList([Block(args, i) for i in range(args.n_layer)])
  File "/2/3/casanovo/tests/RWKV-LM/RWKV-v5/src/model.py", line 950, in <listcomp>
    self.blocks = nn.ModuleList([Block(args, i) for i in range(args.n_layer)])
  File "/2/3/casanovo/tests/RWKV-LM/RWKV-v5/src/model.py", line 854, in __init__
    self.att = RWKV_Tmix_x060(args, layer_id)
  File "/opt/conda/envs/jun/lib/python3.10/site-packages/torch/jit/_script.py", line 307, in init_then_script
    ] = torch.jit._recursive.create_script_module(self, make_stubs, share_types=not added_methods_in_init)
  File "/opt/conda/envs/jun/lib/python3.10/site-packages/torch/jit/_recursive.py", line 476, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, stubs_fn)
  File "/opt/conda/envs/jun/lib/python3.10/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
    create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
  File "/opt/conda/envs/jun/lib/python3.10/site-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
    concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
  File "/opt/conda/envs/jun/lib/python3.10/site-packages/torch/jit/_recursive.py", line 863, in try_compile_fn
    return torch.jit.script(fn, _rcb=rcb)
  File "/opt/conda/envs/jun/lib/python3.10/site-packages/torch/jit/_script.py", line 1343, in script
    fn = torch._C._jit_script_compile(
RuntimeError: 
Python builtin <built-in method apply of FunctionMeta object at 0x747be90> is currently not supported in Torchscript:
  File "/2/3/casanovo/tests/RWKV-LM/RWKV-v5/src/model.py", line 148
        def RUN_CUDA_RWKV6(r, k, v, w, u):
            return WKV_6.apply(r, k, v, w, u)
                   ~~~~~~~~~~~ <--- HERE
'RUN_CUDA_RWKV6' is being compiled since it was called from 'RWKV_Tmix_x060.forward'
  File "/2/3/casanovo/tests/RWKV-LM/RWKV-v5/src/model.py", line 372
        w = self.time_decay + ww

        x = RUN_CUDA_RWKV6(r, k, v, w, u=self.time_faaaa)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

        x = x.view(B * T, C)        

我尝试搜索issue,使用baidu或者google都找不到解决方法,请问是什么原因?

BlinkDL commented 2 months ago

装最新 torch 2.4 试试