QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Apache License 2.0
13.59k stars 1.11k forks source link

[BUG] CUDA Error: invalid device function /tmp/pip-req-build-5rlg4jgm/ln_fwd_kernels.cuh 236 #1198

Closed taoqinghua closed 5 months ago

taoqinghua commented 5 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

在容器中运行bash finetune/finetune_qlora_single_gpu.sh -m /data/shared/test_docker/Qwen-7B-Chat-Int4/ -d /data/shared/test_docker/chat.json 时报错,显示CUDA Error: invalid device function /tmp/pip-req-build-5rlg4jgm/ln_fwd_kernels.cuh 236错误,cuda版本是11.7,更换成11.8、12.1、12.4版本的cuda显示还是同样错误。容器中找不到/tmp/pip-req-build-5rlg4jgm/ln_fwd_kernels.cuh这个文件,不知道具体原因是什么,有大神指点一二吗。

期望行为 | Expected Behavior

到底是什么原因,更换了多个cuda版本都显示同样的错误,求大神指点。

复现方法 | Steps To Reproduce

1、启动docker docker run -itd -v /***://data/shared/test_docker --name test_qwen --gpus all --shm-size 12G qwenllm/qwen /bin/bash 2、进入docker docker exec -it test_qwen04 /bin/bash 3、运行指令 bash finetune/finetune_qlora_single_gpu.sh -m /data/shared/test_docker/Qwen-7B-Chat-Int4/ -d /data/shared/test_docker/chat.json

运行环境 | Environment

- OS:ubuntu
- Python:
- Transformers:
- PyTorch:2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.7

备注 | Anything else?

No response

jklj077 commented 5 months ago

If you are using the provided the docker image with tag qwenllm/qwen(:latest), it is based on CUDA 11.7 and bundles the layer_norm module from flash attention v2, where that invalid device function (cudaOccupancyMaxActiveBlocksPerMultiprocessor which is a CUDA runtime API) is called.

It is likely your nvidia driver is too old to support CUDA 11.7 (and later versions). Please run nvidia-smi and provide the result.

taoqinghua commented 5 months ago

If you are using the provided the docker image with tag qwenllm/qwen(:latest), it is based on CUDA 11.7 and bundles the layer_norm module from flash attention v2, where that invalid device function (cudaOccupancyMaxActiveBlocksPerMultiprocessor which is a CUDA runtime API) is called.

It is likely your nvidia driver is too old to support CUDA 11.7 (and later versions). Please run nvidia-smi and provide the result.

Wed Apr 10 06:16:11 2024
nvidia-smi驱动查询结果如下,感觉应该能支持CUDA11.7,有没有可能是别的什么原因呢
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.23.06 Driver Version: 545.23.06 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla P100-PCIE-16GB Off | 00000000:44:00.0 Off | 0 | | N/A 27C P0 29W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla P100-PCIE-16GB Off | 00000000:87:00.0 Off | 0 | | N/A 27C P0 28W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla P100-PCIE-16GB Off | 00000000:C1:00.0 Off | 0 | | N/A 26C P0 30W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla P100-PCIE-16GB Off | 00000000:C4:00.0 Off | 0 | | N/A 26C P0 29W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+

jklj077 commented 5 months ago

Unfortunately, flash attention v2 does not support P100 (nor V100). You may need to uninstall the related packages in the image (pip uninstall flash_attn dropout_layer_norm) or build the image from scratch and set environment variable BUNDLE_FLASH_ATTENTION to false.

taoqinghua commented 5 months ago

Unfortunately, flash attention v2 does not support P100 (nor V100). You may need to uninstall the related packages in the image (pip uninstall flash_attn dropout_layer_norm) or build the image from scratch and set environment variable BUNDLE_FLASH_ATTENTION to false.

谢谢。