** On entry to HgemmStridedBatched parameter number 10 had an illegal value
cuBLAS API failed with status 7
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/cublas_ops/__init__.py", line 39, in cublas_half_matmul_batched_simple
return _cublas_hgemm_batched_simple(a, b)
RuntimeError: cuBLAS API failed
but this code works:
>>> y = torch.ones([1, 8192, 28672], dtype=torch.float16, device="cuda:0")
>>> z = cublas_ops.cublas_half_matmul_batched_simple(x, y)
Cool idea! Proud to submit a first bug report :)
This PyTorch code (Ubuntu, CUDA 12.1, Torch 2.2.2, Nvidia 4090):
fails with this stack trace:
but this code works: