HITsz-TMG / UMOE-Scaling-Unified-Multimodal-LLMs

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
https://uni-moe.github.io/
728 stars 33 forks source link

Error when running demo.py #5

Open kevinkhanhvu opened 1 month ago

kevinkhanhvu commented 1 month ago

When I try to run file demo.py on one H100 - 80GB, I got this error (when load model) (I really download all models from requirements and install all dependencies), pls help me to check this issue: @longyuewangdcu @eltociear @YanshekWoo @imryanxu @expapa

While copying the parameter named "base_model.model.model.layers.30.mlp.experts.3.down_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.30.mlp.gate.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.30.mlp.gate.lora_B.default.weight", whose dimensions in the model are torch.Size([4, 8]) and whose dimensions in the checkpoint are torch.Size([4, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',).

YunxinLi commented 1 month ago

We are checking the code to quickly resolve the cause of the problem. Could you tell me which version of the model you are running? For example: Uni-MoE-speech-base-interval and Uni-MoE-speech-v1.5 as suggested in the demo.py?

expapa commented 1 month ago

Thank you so much for post this issue, the demo is not functioning well due to some problems in codes, will be update as soon as possible( problems have been solved and codes have been updated now ). However, the error you encounter may not relate to the functioning of code, it seems to be the problem of pytorch version and cuda version not matching, could you pls check this out?

When I try to run file demo.py on one H100 - 80GB, I got this error (when load model) (I really download all models from requirements and install all dependencies), pls help me to check this issue: @longyuewangdcu @eltociear @YanshekWoo @imryanxu @expapa

While copying the parameter named "base_model.model.model.layers.30.mlp.experts.3.down_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.30.mlp.gate.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.30.mlp.gate.lora_B.default.weight", whose dimensions in the model are torch.Size([4, 8]) and whose dimensions in the checkpoint are torch.Size([4, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight", whose dimensions in the model are torch.Size([8, 4096]) and whose dimensions in the checkpoint are torch.Size([8, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',). While copying the parameter named "base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight", whose dimensions in the model are torch.Size([4096, 8]) and whose dimensions in the checkpoint are torch.Size([4096, 8]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n',).

kevinkhanhvu commented 1 month ago

Thanks @expapa , I only install all dependencies follow by file env.txt, I'll check cuda and torch version again!