WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.36k stars 4.22k forks source link

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #1210

Closed marblech closed 1 year ago

marblech commented 1 year ago

Hi, I just download the code and run the test "python test.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val“ and then show error:

Namespace(augment=False, batch_size=32, conf_thres=0.001, data='data/coco.yaml', device='0', exist_ok=False, img_size=640, iou_thres=0.65, name='yolov7_640_val', no_trace=False, project='runs/test', save_conf=False, save_hybrid=False, save_json=True, save_txt=False, single_cls=False, task='val', v5_metric=False, verbose=False, weights=['yolov7.pt']) YOLOR 🚀 v0.1-116-g8c0bf3f torch 1.13.0+cu117 CUDA:0 (Tesla P40, 22919.125MB)

Fusing layers... Traceback (most recent call last): File "test.py", line 319, in test(opt.data, File "test.py", line 58, in test model = attempt_load(weights, map_location=device) # load FP32 model File "/home/hhee/yolov7/models/experimental.py", line 253, in attempt_load model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval()) # FP32 model File "/home/hhee/yolov7/models/yolo.py", line 703, in fuse m.conv = fuse_conv_and_bn(m.conv, m.bn) # update conv File "/home/hhee/yolov7/utils/torch_utils.py", line 199, in fuse_conv_andbn fusedconv.bias.copy(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn) RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

and this is my run environment :

Collecting environment information... PyTorch version: 1.13.0+cu117 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 8.4.0-3ubuntu2) 8.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.15.0-48-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.6.55 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla P40 GPU 1: Tesla P40

Nvidia driver version: 510.85.02 cuDNN version: Probably one of the following: /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudnn.so.8 /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8 /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudnn_adv_train.so.8 /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8 /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8 /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudnn_ops_train.so.8 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] torch==1.13.0 [pip3] torchvision==0.14.0 [conda] Could not collect

how could i solve this problem? thx

HerrAskinSM commented 1 year ago

Hello! I have the same problem with python detect.py ...

marblech commented 1 year ago

i have checked some document. i think the reason is cuda version is lower then pytorch needs. so i updating my gpu driver and cuda toolkit to 11.7. hope that can solve this problem.

HerrAskinSM commented 1 year ago

marblech, thank you very much!!! You gave me a solution. But I downgraded the torch version to 1.12.1 and everything worked (in requirements.txt torch>=1.7)

gioriog commented 1 year ago

Hello, I had the same issues and resolved as @serOMENdev suggested. Here you can select the correct versions of torch and torchvision for your CUDA version.

marblech commented 1 year ago

Hello. I had solved this problem through upgrade gpu driver and cuda version to 11.7. and then problem solved and close issue.

lihaogang commented 1 year ago

unset LD_LIBRARY_PATH runs this command

LightSun commented 1 month ago

I had resove it, by link libtorch lib first.